CMSC 28000 — Lecture 21

A few extra examples of Turing machines

We have language $L = \{ x \# x \mid x \in \Sigma^* \}$. In other words, this is the language where we have two copies of the same word divided by $\#$. We've seen already that this language is not context-free. We can describe the machine as follows:

  1. Mark a symbol and go to the right of the $\#$ to the first unmarked symbol. Check to see if the symbol is the same one that we just marked. If there is no $\#$, reject. If the symbols are not the same, reject.
  2. Go to the left past the $\#$ and until a marked symbol is encountered and go to the symbol to its right. If the current symbol is unmarked, repeat the process from step 1. If the current symbol is $\#$, go to the next step.
  3. Go to the right. If an unmarked symbol is encountered, reject. If a blank symbol is encountered, accept.

Here, we have a language that looks much more complicated than anything we've done so far. The language that we are considering is $$L = \{\# x_1 \# x_2 \# \cdots \# x_\ell \mid x_i \in \{a,b\}^*, x_i \neq x_j, 1 \leq i,j \leq \ell, i \neq j \}.$$ This language is the list of strings separated by $\#$ and each string is different. The machine that decides this language works by comparing each word $x_i$ with every word $x_j$ for $j > i$. This is clearly beyond the capabilities of any machines we've studied so far. A description of the operation of the machine is as follows:

  1. On input $w$, mark the leftmost tape symbol. If the symbol is $\Box$, then accept. If the symbol is $\#$, then go to the next step. Otherwise, reject.
  2. Scan right to the next $\#$ and mark it. If no $\#$ is encountered, before a $\Box$, then only $x_1$ was present and we can accept.
  3. Compare the two strings to the right of the marked $\#$. If they are equal, reject.
  4. Move to the right of the rightmost marked $\#$. If no $\#$ is encountered before a $\Box$, unmark the leftmost marked $\#$ and mark the next $\#$ to its right, and move the mark from the rightmost $\#$ to the $\#$ just to the right of the first marked $\#$. Now, if there is no $\#$ for the rightmost mark, all the strings have been compared, and we can accept.
  5. Otherwise, go to step 3.

Notice that step 3 is a little light on details. How does one "compare" the two strings? Well, we have an example of how to do this from the previous example with the copies separated by $\#$.

Here is a quick example of an even less specified description of Turing machines.

The class of decidable languages is closed under complement.

Given a Turing machine $M$ that decides a language $L$, we can construct a Turing machine $M'$ that decides $\overline L$, which operates as follows:

  1. On input $w$, run $M$ on $w$.
  2. If $M$ rejects $w$, then accept; if $M$ accepts $w$, then reject.

Since $L$ is decidable, $M$ is guaranteed to halt. Thus, $M'$ decides $\overline L$ and $\overline L$ is decidable.

Variants of Turing machines

It's worth thinking about what happens if we try the same thing that we've done with our previous machines and make some modifications to Turing machines. For instance, if we remember the finite automaton, we have the deterministic and nondeterministic variants. As it turned out, the power of the machines were exactly the same, in that they both recognized exactly the same class of languages. However, there was a tradeoff in the number of states between the two models. Similarly, we recall that this same variation yields vastly different results for the pushdown automaton. The deterministic pushdown automata could recognize fewer languages that could be recognized by the nondeterministic machine.

We now explore the same kinds of ideas with Turing machines. The goal is to show that the Turing machine is actually fairly powerful and to give us some tools when working with them.

As with PDAs, we need to define a notion of the current state of the machine, or the configuration.

A configuration is a string $uqv$, where $q \in Q$ is the current state, and the current tape contents is $uv$, where $u,v \in \Gamma^*$, and the tape head is located on top of the tape cell corresponding to the first symbol of $v$. For example, the initial configuration with input word $w$ is the word $q_0 w$. An accepting configuration is a word $u q_A v$ with $u,v \in \Gamma^*$.

Unlike with PDAs, we don't need a separate notion of the input to be consumed because that information is on the tape and can be changed during the course of computation.

One example of such a modification is what if instead of using only a single tape, we allowed the Turing machine to have multiple tapes with tape heads that move independently of each other? That is, for $k$ tapes, we modify the definition so we have the transition function $\delta : Q \times \Gamma^k \to Q \times \Gamma^k \times \{L,R\}^k$.

Every language recognized by a multi-tape Turing machine can be recognized by a single tape Turing machine.

The idea that we use to show this is one that we will be using a lot in the rest of this course: simulation. We show how to simulate a multi-tape Turing machine on a single-tape Turing machine. The idea is to store all the contents of each tape on the single tape and separate each tape as required. We also keep track of the tape head positions on each of the virtual tapes.

What does this machine look like? We describe a Turing machine which simulates a $k$-tape TM $M$ as follows:

  1. On input $w = a_1 \cdots w_n$: Format the tape so that it represents the $k$ tapes of $M$ by $$\# \dot a_1 a_2 \cdots a_n \# \dot \Box \# \dot \Box \# \cdots \#$$
  2. To simulate a single move, scan the tape from the first $\#$, which marks the left hand end, to the $(k+1)$st $\#$, which marks the right hand end to determine the symbols under the virutal tape heads. Make another pass to update the tape heads according to the transition function of $M$.
  3. If a virtual tape head gets moved to the right onto a $\#$, this means that $M$ moved the corresponding tape head onto a blank portion of the tape. We write a $\Box$ on this tape cell and shift the contents of the rest of the tape to the right and continue the simulation.

 

Nondeterminism

Our current Turing machine is defined to be deterministic. However, we can show that we can introduce nondeterminism without increasing or decreasing the power of the model.

In a nondeterministic Turing machine, the transition function is a function $\delta: Q \times \Gamma \to 2^{Q \times \Gamma \times \{L,R\}}$. We can view the computation of a nondeterministic Turing machine as a tree, where each branch represents a possible guess. A nondeterministic Turing machine accepts a word if at least one branch of the computation tree enters an accepting state. It's important to note that we only require that at least one computation accepts, since some branches may lead to infinitely long computation paths.

Every nondeterministic Turing machine has an equivalent deterministic Turing machine.

The idea behind this proof is to construct a Turing machine that performs a breadth-first search of the computation tree. If the word is accepted by the nondeterministic machine, then a search will eventually find the branch where the computation is accepted. Otherwise, the machine may run forever.

We use a 3-tape deterministic machine to simulate the nondeterministic machine $N$. We can do this because we have just conveniently showed that a $k$-tape machine is equivalent to a single tape machine. The tapes are used as follows:

  1. The first tape keeps a copy of the input word at all times.
  2. The second tape is an instance of $N$'s tape on some branch of computation.
  3. The third tape is an address that points to a particular configuration in the computation tree of $N$.

The third tape is what will allow us to navigate through the tree. For each node, we assign a value from $1$ to $k$, where $k$ is the size of the largest set of choices in the transition function of $N$. Then by considering a string over the alphabet $\{1,\dots,k\}$, we follow a particular branch at each set of choices, starting from the root of the tree. However, not every string over $\{1,\dots,k\}$ will correspond to a valid address, since not every nondeterministic set of choices is guaranteed to have as many as $k$ choices. This is fine, since that address simply will not correspond to a node. Then by enumerating through every string, we are guaranteed to reach every node of the computation tree.

Now, the machine operates as follows:

  1. Initialize the tapes so that tape 1 contains the input word $w$, tape 2 begins with a copy of $w$ and 3 is set to $\varepsilon$.
  2. Use tape 2 to simulate $N$ on $w$. At each step, check the next symbol on tape 3 to determine which choice to make on the transition function of $N$. If no more symbols remain on tape 3 or if the choice is invalid, abort the branch and move to the next stage. If a rejecting configuration is encountered, move to stage 3. If an accepting configuration is encountered, accept.
  3. Increment the string on tape 3, copy the input $w$ from tape 1 to tape 2, and move to stage 2 to begin simulation of the next branch of computation.

 

This gives us the following corollary.

A language is recognizable if and only if some nondeterministic Turing machine recognizes it.

However, if we know that a nondeterministic Turing machine will always halt on every branch of its computation, then the corresponding deterministic machine will be also always halt. This leads to the following result.

A language is decidable if and only if some nondeterministic Turing machine decides it.

Something you might notice or object to is that simulating a nondeterministic Turing machine with a deterministic machine is going to take a very long time. Since we're performing a breadth-first search, this means that we're looking at a computation that's exponential in the length of the shortest accepting path in the nondeterministic computation tree. But that's okay, because we only care whether our deterministic machine is able to recognize or decide any language that a nondeterministic machine is capable of recognizing.

But for argument's sake, let's suppose that we do care about the time that it takes. We can measure this: how many steps is the computation on our machine? We can apply our notions of efficiency from other CS courses and consider polynomially many steps to be good or efficient. So suppose that I have a nondeterministic TM that has an accepting computation branch that's polynomially long in the size of the input. Is it possible to simulate or come up with a way to solve the same problem deterministically, but still end up with only a polynomially long computation?

The answer is: we don't know! Because this question is exactly the P vs. NP problem.