CMSC 28000 — Lecture 20

Computability

An interesting observation you may have noticed is that while $\{a^n \mid n \geq 0\}$ is accepted by a DFA, the language $\{a^n b^n \mid n \geq 0\}$ is not. And while a PDA can accept $\{a^n b^n \mid n \geq 0\}$, it can't accept $\{a^n b^n c^n \mid n \geq 0\}$. A natural question arises: can we play this game indefinitely? This is one of the things to keep in mind as we begin exploring the limits of computability. Generally speaking, computability theory is about whether a particular problem can be solved or not. Depending on what you've assumed about computers so far, this statement may or may not be obvious to you, but it's another thing to have it proved.

Computability theory is really the beginnings of the entire field of computer science and to see why, we need to go all the way back to the early 1900s. One of the central ideas in modern mathematical logic that came up at the time is distinguishing the notions of truth from proof. This might sound very abstract, but today, we would call these things the semantics and the synatx of a program, respectively.

The idea is that by separating the two, we can reduce proof and proof systems to mechanical manipulation of symbols. Recall that we use inference rules to describe valid steps in a proof. For instance, the rule \[\frac{\varphi \rightarrow \psi \quad \varphi}{\psi}\] says that if we know that $\varphi \rightarrow \psi$ and $\psi$, then we can conclude $\varphi$. It's common to think of this as saying something about the truth (that is, the semantics) of these statements, but it's actually saying something syntactic: if we see the strings $\varphi \rightarrow \psi$ and $\psi$ in our proof, we can add the string $\varphi$ to our proof.

This idea leads to a few questions.

The first question was answered by Kurt Gödel by the Completeness Theorem in 1929. The second question was posed by Hilbert in 1928, as the Entscheidungsproblem.

In essence, this second question is about computation and it is what kicked off a flurry of activity in the 1930s trying to nail down what exactly it means to compute something. Gödel used what are called $\mu$-recursive functions to try to do this. Alonzo Church defined the $\lambda$-calculus for this purpose. Emil Post defined what are called Post systems, which are a kind of string rewriting system.

And in 1936, Alan Turing defined what we now call Turing machines.

Turing machines

First, we'll try to understand what a Turing machine is informally. Remember that we started out with the finite automaton, considered the simplest model of computation. This is just some finite set of states, or memory, and some transitions. We can do a lot of useful things with FAs but there are many languages that FAs aren't powerful enough to recognize, such as $a^n b^n$. However, we can enhance the power of the machine by adding a stack. This gives us the ability to recognize a larger class of languages.

We can think of a Turing machine in the same way. A Turing machine is just a finite automaton with the two main restrictions we placed on it now lifted. That is, the tape that we placed our input on before which was read-only can now be written, and our tape is infinitely long. Furthermore, we can move however we like on the tape, unlike the pushdown stack.

A Turing machine is a 7-tuple $M = (Q,\Sigma,\Gamma,\delta,q_0,q_{\mathsf{acc}},q_{\mathsf{rej}})$, where

In our model, we have a one-sided infinitely long tape, with the machine starting its computation with the input on the tape beginning at the left end of the tape. The input string is followed by infinitely many blank symbols $\Box$ at the right. The transition function is deterministic.

What exactly does this model try to capture? What Turing attempted to do was formalize how computers worked, by which we mean the 1930s conception of what a computer is. In other words, he tried to formalize how human "computers" might "compute" a problem. We can view the finite state system as a set of instructions we might give to a person, that tells them what to do when they read a particular symbol. We allow this person to read and write on some paper and we let them have as much paper as they want.

In the course of a computation on a Turing machine, one of three outcomes will occur.

  1. The machine enters the accepting state and accepts and halts.
  2. The machine enters the rejecting state and rejects and halts.
  3. The machine never enters the accepting or rejecting state and does not halt.

For a Turing machine $M$ and an input word $w$, if the computation of $M$ on $w$ enters the accepting state, then we say that $M$ accepts $w$. Likewise, if the computation of $M$ on $w$ enters the rejecting state, then we say $M$ rejects $w$. We call the set of strings that are accepted by $M$ the language of $M$, denoted $L(M)$.

It is important to note that in these definitions, $w$ is the string that the computation begins with, not the string that is on the tape when the computation halts.

Notice that since we now have three possibilities, we have more notions of what it means for a Turing amchine to accept a language.

A Turing machine $M$ recognizes a language $L$ if every word in $L$ is accepted by $M$ and for every input not in $L$, $M$ either rejects the word or does not halt. A language $L$ is recognizable if some Turing machine recognizes it.

A Turing machine $M$ decides a language $L$ if every word in $L$ is accepted by $M$ and every input not in $L$ is rejected by $M$. A language $L$ is decidable if some Turing machine decides it.

These notions lead directly to two classes of languages.

The class of (Turing-)recognizable languages is the class of languages that can be recognized by a Turing machine. This class is also called the class of recursively enumerable languages.

The class of (Turing-)decidable languages is the class of languages that can be decided by a Turing machine. This class is also called the class of recursive languages.

Consider the language $L = \{ x \# x \mid x \in \{a,b\}^* \}$. This is the language where we have two copies of the same word divided by $\#$ over the alphabet $\{a,b\}$. It is not hard to show that this language is not context-free. How would we build a Turing machine for this? We can think of what we'd like such a machine to do at a high level.

  1. Read a symbol and remember what it is. Mark the cell.
  2. Go to the right of the $\#$ to the first unmarked symbol. If there is no $\#$, then reject.
  3. Check to see if the symbol is the same one that we just marked. If the symbols are not the same, reject.
  4. Mark the cell.
  5. Go to the left past the $\#$ and until a marked cell is encountered.
  6. Go to the symbol to the right.
  7. If the current symbol is unmarked, repeat the process from Step 1. If the current symbol is $\#$, go to the next step.
  8. Scan to the right. If an unmarked symbol is encountered, reject. If a blank symbol is encountered, accept.

As always, the challenge is formalizing this mechanically. We will attempt this and see why we will not do this anymore. Here is the formal description. Let $M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\mathsf{acc}}, q_{\mathsf{rej}})$ where

and the transition function $\delta$ is defined

$a$ $b$ $\#$ $\Box$ $\checkmark$
$q_0$ $(q_{1,a}, \checkmark, \rightarrow)$ $(q_{1,b}, \checkmark, \rightarrow)$ $(q_5, \checkmark, \rightarrow)$ reject reject
$q_{1,a}$ $(q_{1,a}, a, \rightarrow)$ $(q_{1,a}, b, \rightarrow)$ $(q_{2,a}, \checkmark, \rightarrow)$ reject reject
$q_{1,b}$ $(q_{1,b}, a, \rightarrow)$ $(q_{1,b}, b, \rightarrow)$ $(q_{2,b}, \checkmark, \rightarrow)$ reject reject
$q_{2,a}$ $(q_3, \checkmark, \leftarrow)$ reject reject reject $(q_{2,a}, \checkmark, \rightarrow)$
$q_{2,b}$ reject $(q_3, \checkmark, \leftarrow)$ reject reject $(q_{2,b}, \checkmark, \rightarrow)$
$q_3$ reject reject $(q_4, \#, \leftarrow)$ reject $(q_3, \checkmark, \leftarrow)$
$q_4$ $(q_4, a, \leftarrow)$ $(q_4, b, \leftarrow)$ reject reject $(q_0, \checkmark, \rightarrow)$
$q_5$ reject reject reject $(q_{\mathsf{acc}}, \Box, \rightarrow)$ $(q_5, \checkmark, \rightarrow)$

Here, "reject" is shorthand for the transition $(q_{\mathsf{rej}}, \sigma, \rightarrow)$ for whichever $\sigma \in \Gamma$ was read to make it more clear to the reader which transitions reject (otherwise, it gets lost in the table).

Notice that we use $\checkmark$ as our "mark". This belongs on the tape alphabet, since we never want to see an input string that uses $\checkmark$.

Here is the state diagram. Note that the rejecting state and its incoming transitions are not shown.

Because Turing machines are able to modify the contents of the tape, we often use high level descriptions like "copy the contents of the tape" or "shift the contents to the right". But can we really do those things? Let's see a simple example of one such subroutine—shifting the contents of a tape one cell to the right.

The idea is to write a blank in the tape cell that we start in while remembering the symbol that we overwrote in the state memory. When we move to the right, we write down the last symbol we saw and remember the symbol that we overwrote and repeat this process until we see a blank cell.

We'll define the machine $M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\mathsf{acc}}, q_{\mathsf{rej}})$ that will shift things over by one cell by

and the transition function $\delta$ is defined by

Here is the state diagram for the "shift over" machine for $\Sigma = \{a,b\}$.

Notice that no transitions go to the rejecting state.

So now that we've shifted the tape contents to the right, we can return to the cell that we started in by searching for a blank cell—the one that we inserted when we first began the shifting process.