CMSC 28000 — Lecture 19

Computability

An interesting observation you may have noticed is that while $\{a^n \mid n \geq 0\}$ is accepted by a DFA, the language $\{a^n b^n \mid n \geq 0\}$ is not. And while a PDA can accept $\{a^n b^n \mid n \geq 0\}$, it can't accept $\{a^n b^n c^n \mid n \geq 0\}$. A natural question arises: can we play this game indefinitely? This is one of the things to keep in mind as we begin exploring the limits of computability. Generally speaking, computability theory is about whether a particular problem can be solved or not. Depending on what you've assumed about computers so far, this statement may or may not be obvious to you, but it's another thing to have it proved.

Computability theory is really the beginnings of the entire field of computer science and to see why, we need to go all the way back to the early 1900s. One of the central ideas in mathematical logic that came up at the time is distinguishing the notions of semantic truth from proof. The idea is that by separating the two, we can reduce proof and proof systems to mechanical manipulation of symbols. If we define a proof system so that it matches our notions of semantic truth and load it with the right axioms, then we have a way to express a mathematical statement in, say, first-order logic, and we'd have an algorithm that could mechanically construct a proof of the truth or falsehood of that statement for us.

One of the main drivers of this research was David Hilbert and in 1928, he proposed the Entscheidungsproblem, or decision problem, which asks for such an algorithm.

In order for this to happen, we need a proof system and a set of axioms that satisfies the following properties.

Completeness: Every statement can either be proved or disproved.
Consistency: Following a set of valid proof steps would never result in a contradiction. In other words, it should not be possible to prove an assertion and the negation of the same assertion.
Decidability: There is a mechanical procedure that will always answer yes or no to the truth of any statement.

At this point, some of you will be expecting Kurt Gödel to show up and, well, here he is. Gödel proves a bunch of very important results dealing with all of this. First, he proves the Completeness Theorem in 1929, which says that first-order logic is complete and consistent, so if you have a bunch of axioms that don't contradict each other, there's a model for them, which sounds great.

But then he goes on to prove the Incompleteness Theorems in 1931, which says that if you have a consistent set of axioms that can be enumerated by an algorithm, then there's going to be a true statement about the integers that you can't prove, and furthermore, you can't prove that set of axioms is consistent using that set of axioms. This means that your set of axioms/system is incomplete. This is a problem if your goal is to generate proofs for every statement about the integers.

This is just one of the developments that occurred due to this activity around trying to find a way to mechanically prove things. Gödel used what are called $\mu$-recursive functions to try to do this. Alonzo Church defined the $\lambda$-calculus for this purpose. Emil Post defined what are called Post systems, which are a kind of string rewriting system.

And in 1936, Alan Turing defined what we now call Turing machines.

Turing machines

First, we'll try to understand what a Turing machine is informally. Remember that we started out with the finite automaton, considered the simplest model of computation. This is just some finite set of states, or memory, and some transitions. We can do a lot of useful things with FAs but there are many languages that FAs aren't powerful enough to recognize, such as $a^n b^n$. However, we can enhance the power of the machine by adding a stack. This gives us the ability to recognize a larger class of languages.

We can think of a Turing machine in the same way. A Turing machine is just a finite automaton with the two main restrictions we placed on it now lifted. That is, the tape that we placed our input on before which was read-only can now be written, and our tape is infinitely long. Furthermore, we can move however we like on the tape, unlike the pushdown stack.

A Turing machine is a 7-tuple $M = (Q,\Sigma,\Gamma,\delta,q_i,q_A,q_R)$, where

$Q$ is a finite set of states,
$\Sigma$ is a finite input alphabet that does not contain the blank symbol $\Box$,
$\Gamma$ is a finite tape alphabet with a blank symbol $\Box \in \Gamma$ and $\Sigma \subseteq \Gamma$,
$\delta: Q \times \Gamma \to Q \times \Gamma \times \{L,R\}$ is the transition function,
$q_0 \in Q$ is the initial state,
$q_A \in Q$ is the accepting state,
$q_R \in Q$ is the rejecting state, with $q_A \neq q_R$.

In our model, we have a one-sided infinitely long tape, with the machine starting its computation with the input on the tape beginning at the left end of the tape. The input string is followed by infinitely many blank symbols $\Box$ at the right.

In the course of a computation on a Turing machine, one of three outcomes will occur.

The machine enters the accepting state and accepts and halts.
The machine enters the rejecting state and rejects and halts.
The machine never enters the accepting or rejecting state and does not halt.

For a Turing machine $M$ and an input word $w$, if the computation of $M$ on $w$ enters the accepting state, then we say that $M$ accepts $w$. Likewise, if the computation of $M$ on $w$ enters the rejecting state, then we say $M$ rejects $w$. We call the set of strings that are accepted by $M$ the language of $M$, denoted $L(M)$.

Notice that since we now have three possibilities, we have more notions of what it means for a Turing amchine to accept a language.

A Turing machine $M$ recognizes a language $L$ if every word in $L$ is accepted by $M$ and for every input not in $L$, $M$ either rejects the word or does not halt. A language $L$ is recognizable if some Turing machine recognizes it.

A Turing machine $M$ decides a language $L$ if every word in $L$ is accepted by $M$ and every input not in $L$ is rejected by $M$. A language $L$ is decidable if some Turing machine decides it.

We'll define a Turing machine for the language $L = \{a^n b^n c^n \mid n \geq 0\}$.

The machine, when formally defined looks like this:

Note that here, $q_7$ acts as the accepting state, and that the rejecting state is nowhere to be seen. This really should've been included explicitly, but since this language is known to be decidable, we can imagine that all of the undefined transitions go to the rejecting state. Again, this is not always going to be true. I should probably replace this diagram with one that includes an explicit reject state.

Here, we rely on the idea that since our input alphabet is finite, we can just include "marked" variants of each symbol in our tape alphabet. The more obvious thing that we see is that even for a relatively simple language, specifying a Turing machine with even this amount of detail can get cumberson. This will be the only time we see a Turing machine completely defined. Instead, we opt for algorithmic descriptions and as we get further along, we will conveniently drop more and more details.

A less formal description of the operation of the machine can be described as follows:

Read an $a$ and mark it.
Go right across the word until we find a $b$ and mark it.
Go right across the word until we find a $c$ and mark it.
Go left until we find a marked $a$. Then move right to the unmarked $a$.
Repeat this process from step 1 until we have marked the same number of $c$'s, $b$'s, and $a$'s.
If there aren't enough $b$'s or $c$'s to mark, then the machine moves to the reject state and rejects the word.
Otherwise, we have marked an equal number of $a$'s, $b$'s, and $c$'s and the tape head is on the first marked $b$. Go to the right, ensuring that we are only reading marked $b$'s.
Now, we are on a marked $c$. Go to the right, ensuring that we are only reading marked $c$'s.
If we encounter an unmarked symbol, the machine moves into the reject state. Otherwise, we should read a $\Box$ and accept.

We have language $L = \{ x \# x \mid x \in \Sigma^* \}$. In other words, this is the language where we have two copies of the same word divided by $\#$. We've seen already that this language is not context-free. We can describe the machine as follows:

Mark a symbol and go to the right of the $\#$ to the first unmarked symbol. Check to see if the symbol is the same one that we just marked. If there is no $\#$, reject. If the symbols are not the same, reject.
Go to the left past the $\#$ and until a marked symbol is encountered and go to the symbol to its right. If the current symbol is unmarked, repeat the process from step 1. If the current symbol is $\#$, go to the next step.
Go to the right. If an unmarked symbol is encountered, reject. If a blank symbol is encountered, accept.

Here, we have a language that looks much more complicated than anything we've done so far. The language that we are considering is $$L = \{\# x_1 \# x_2 \# \cdots \# x_\ell \mid x_i \in \{a,b\}^*, x_i \neq x_j, 1 \leq i,j \leq \ell, i \neq j \}.$$ This language is the list of strings separated by $\#$ and each string is different. The machine that decides this language works by comparing each word $x_i$ with every word $x_j$ for $j > i$. This is clearly beyond the capabilities of any machines we've studied so far. A description of the operation of the machine is as follows:

On input $w$, mark the leftmost tape symbol. If the symbol is $\Box$, then accept. If the symbol is $\#$, then go to the next step. Otherwise, reject.
Scan right to the next $\#$ and mark it. If no $\#$ is encountered, before a $\Box$, then only $x_1$ was present and we can accept.
Compare the two strings to the right of the marked $\#$. If they are equal, reject.
Move to the right of the rightmost marked $\#$. If no $\#$ is encountered before a $\Box$, unmark the leftmost marked $\#$ and mark the next $\#$ to its right, and move the mark from the rightmost $\#$ to the $\#$ just to the right of the first marked $\#$. Now, if there is no $\#$ for the rightmost mark, all the strings have been compared, and we can accept.
Otherwise, go to step 3.

Notice that step 3 is a little light on details. How does one "compare" the two strings? Well, we have an example of how to do this from the previous example with the copies separated by $\#$.

The Church-Turing thesis

As a reminder, the Turing machine was introduced by Turing in 1936, while the finite automaton model was introduced by Kleene in the 50's. It's obvious now that the two are related and it's very helpful to relate the two models, but Turing definitely didn't think of the machines he defined in the same way that we do now. In fact, near the end of Kleene's 1951 technical report, he devotes a paragraph or two discussing the possible similarities between his model and Turing's.

Instead, what Turing attempted to do was formalize how computers worked, by which we mean the 1930s conception of what a computer is. In other words, he tried to formalize how human computers might compute a problem. We can view the finite state system as a set of instructions we might give to a person, that tells them what to do when they read a particular symbol. We allow this person to read and write on some paper and we let them have as much paper as they want.

This gives us an arguably concrete notion of computation. One of the issues with a lot of the formalist work of the 30s was that there was a lot of it and it wasn't entirely clear that these were all really different ways of doing the same thing. We've seen hints of this already: there's no real reason we should've expected, say, certain automata and grammars to be equivalent, but it turned out they are.

So it wasn't obvious that Church's $\lambda$-calculus and Gödel's $\mu$-recursive functions and Post's Post systems were all basically the same. The advantage of the Turing machine is the very clear analogy to human and mechanical computation. This obviousness makes it easy to prove that these other systems were equivalent to Turing machines and led to the Church-Turing thesis.

Nowadays, we think of the Church-Turing thesis in the context of real computers and the notion of Turing-completeness of programming languages and other systems. However, the Church-Turing thesis is really a statement about the equivalence of those formalisms from the 30s. Since Turing machines are now accepted as the standard of computability, Turing-completeness is just the property of a particular system to be able to simulate a Turing machine, and implies that this system is capable of "real" computation.

Most people are introduced to the notion of Turing-completeness when thinking about programming languages and languages similar to programming languages. For instance, it's not surprising that your typical programming language is Turing-complete. Where things get more interesting is when dealing with those languages that we don't typically think of as "programming" languages, like $\TeX$ or C++ template metaprogramming. Of course, this has important theoretical uses, like showing that theoretical molecular or quantum models of computation can at least achieve the standard notion of computability. This also has less important theoretical uses, like showing that Magic: The Gathering or PowerPoint transitions also have the full power of computability.

The simplicity of the definition of Turing machine makes it fairly robust. Just like with finite automata and pushdown automata, we can ask if we can modify the machine and how it changes its computational power. There happens to be very little that you can do to increase the computational power of a Turing machine: multiple tracks, multiple tapes with independent tape heads, doubly-infinite tapes, even nondeterminism.

The other important idea that the definition of the Turing machine gives us is a concrete way to think about resources. Our basic notions from complexity theory are derived here: time is just how many steps a Turing machine makes and space is just how many tape cells a Turing machine uses.