CMSC 28000 — Lecture 23

Universality and Undecidability

A problem that is not decidable is said to be undecidable. The notion of undecidable problems poses a problem in practical terms (and philosophically as well), since it amounts to saying that there is no algorithm that can solve this problem.

For one thing, it is not clear that such a thing can even happen. After all, maybe these problems are just really difficult and we haven't figured out a way to solve them. This certainly seems like the case for something like context-free language equality. For example, the question of whether equality for deterministic CFLs was posed by Ginsburg and Greibach when they introduced DCFLs in 1966 and this remained open until 1997 when Géraud Sénizergues proved that it was decidable (this won him the Gödel Prize in 2002).

However, we will prove that, in fact, there are problems out there that have no algorithm that will solve them. We will do this by first showing that the following language is undecidable. \[A_{TM} = \{ \langle M,w \rangle \mid \text{$M$ is a Turing machine and $M$ accepts $w$} \}\]

This is the membership problem for Turing machines. This looks like the decision problems we've seen before: of course, there is a way to encode Turing machines as strings, by using the same methods we've used for encoding DFAs and CFGs.

It's important to note here that while $A_{TM}$ is not decidable, we can show that it is recognizable.

$A_{TM}$ is recognizable.

This is not actually completely obvious. What this says is that there is a Turing machine out there that can take an encoding of a Turing machine and some input, and simulate the encoded Turing machine on the desired input word. Such a Turing machine is called a universal Turing machine. There are a few interesting observations we can make.

We've studied a number of problems where we're essentially simulating other models of computation, but the idea of a universal Turing machine acts very much like our real general-purpose computers do. And each of our computers has various differences, but via encodings and the power of being able to decode these encodings allows us to simply write programs and have them work on every computer. Some people liken this to Turing essentially inventing software.

So there's a very clear analogy here to writing programs. Just as we can treat the programs we write as text files, we can treat the Turing machines as strings. But there is another analogy that many of you will be familiar with if you've taken CS 151 or 161: functions as data. In functional programming, we are used to the idea of treating functions as essentially another kind of value that can be passed around. The idea behind encoding Turing machines as strings is no different.

To prove that $A_{TM}$ is recognizable, we'll describe how a universal Turing machine could work.

We will describe a universal Turing machine $U$. On input $\langle M,w \rangle$, where $M$ is a Turing machine and $w$ is a string, $U$ operates as follows:

  1. Simulate $M$ on input $w$.
  2. If $M$ enters the accept state, accept; if $M$ ever enters the reject state, reject.

Note that because $U$ simulates another Turing machine, $U$ will loop forever if $M$ never enters the acecpt or reject state. Of course, this is a rather unsatisfying description of a very important concept, so let's take a closer look at the implementation.

Our universal Turing machine uses three tapes:

  1. A tape containing the description of the Turing machine $M$ to be simulated.
  2. A simulation of the work tape of $M$.
  3. The current state of $M$.

From this, it should not be difficult to see how the machine works. The simulation of the work tape is straightforward, so it remains to see how the simulation of the finite state portion of $M$ is simulated. This is taken care of by scanning the description of $M$ in the first tape. Since the current state is kept track of on the third tape, the machine only needs to scan the transitions in the description of $M$ on the first tape to determine which transition to take.

If we take a moment to think about this, we have a few reasons to suspect that this language isn't decidable. Suppose that it is—then this gives us a way to decide any recognizable language. Of course, this on its own is not enough to tell us that the problem is undecidable, only that it smells like this is the case. After all, one possibility could be that recognizable and decidable languages are really the same thing. So we need something a bit stronger.

Diagonalization

Now, we move on to the technique that we'll use to show that $A_{TM}$ is undecidable. The crux of the idea has to do with the number of languages that exist compared to the number of Turing machines there are.

This will take us back to some basic set theory. Specifically, we'll be dealing with the size of infnite sets. Georg Cantor was interested in the question of how to compare the sizes of two infinite sets. For instance, if we had the set of natural numbers and the set of all even numbers, what can we say about their sizes? Cantor's solution to this was to show that two sets have the same size if you could pair every element of one set with the other.

Recall the following definitions about functions.

Let $A$ and $B$ be two sets and consider a function $f:A \to B$. We say $f$ is one-to-one or injective if $f(a) \neq f(b)$ whenever $a \neq b$. We say $f$ is onto or surjective if for every $b \in B$, there exists $a \in A$ such that $f(a) = b$. A function that is both one-to-one and onto is a correspondence or bijection.

We say a set $A$ is countable if either it is finite or has the same size as $\mathbb N$.

Our intuitive understanding of infinity leads us to think that infinite sets are all the same size, since they all have infinitely many things in them. So we might expect that we can if we try hard enough, we can always come up with a correspondence. As it turns out, this is not the case and there are infinite sets that are larger than $\mathbb N$. Such a set is said to be uncountable.

The trick that Cantor uses, the diagonal argument, works similarly to the pumping lemma proofs we've seen. Suppose that $\mathbb R$ were countable. Then there's a bijection $f : \mathbb N \to \mathbb R$ and we can list all the elements. We can then use this hypothetical list to construct a real number $x \in \mathbb R$. However, by the way that the real number $x$ is constructed, it is actually impossible that it shows up in the list, which implies that $f$ could not have been a bijection after all.

We'll use the same idea for many of the proofs in the next few classes. We'll start by using it to show that unrecognizable languages exist. But first, here are some theorems about sets of strings.

For every alphabet $\Sigma$, the set of strings $\Sigma^*$ is countable.

The proof of this is not difficult: just take $f(n) = w$, where $w$ is the $n$th string in lexicographic order, for some order on $\Sigma$. In a sense, what we're really doing is we're really just counting in base-$|\Sigma|$.

But what about languages? The set of languages turns out to be uncountable. We'll adapt Cantor's diagonal argument for this proof.

For every alphabet $\Sigma$, the set of languages, $2^{\Sigma^*}$, is uncountable.

Suppose that $2^{\Sigma^*}$ is countable. We will make use of the countability of $\Sigma^*$. Since we assume that $2^{\Sigma^*}$ is countable, there exists a bijection $f : \Sigma^* \to 2^{\Sigma^*}$. In other words, we map each string over $\Sigma$ to a language, or a subset of $\Sigma^*$.

Now, define the language $D$ by \[D = \{w \in \Sigma^* \mid w \not \in f(w)\}.\] Since $D \subseteq \Sigma^*$ is a language and $f$ is a bijection, there exists a string $x \in \Sigma^*$ such that $f(x) = D$. So we can ask ourselves whether $x \in D$. We have \[x \in D \iff x \in f(x),\] since $D = f(x)$. But by definition of $D$, we have \[x \in D \iff x \not \in f(x).\] This gives us \[x \in f(x) \iff x \not \in f(x),\] or, in other words, \[x \in D \iff x \not \in D,\] which is a contradiction. Therefore, $2^{\Sigma^*}$ is not countable.

Why is this called a diagonalization argument? Consider the following table, where we have strings running down, and a list of languages running across. This table tells us, hypothetically, whether a particular string is in a particular language, which is something we can construct if we had $f$. Notice that $D$ is constructed by going along the diagonal: $x \in f(x)$ if and only if $x \in D$. The contradiction becomes clear when we look up the entry for $D$ on $w$, where $f(w) = D$.

$f(\varepsilon)$ $f(0)$ $f(1)$ $f(00)$ $f(01)$ $\cdots$
$\varepsilon$ ✔️️ ✔️️ ✔️️
$0$ ✔️️ ✔️️
$1$ ✔️️ ✔️️ ✔️️
$00$ ✔️️
$01$ ✔️️ ✔️️ ✔️️
$\vdots$

This proof might seem a bit discomforting because of the way that we defined $D$. And you are not alone—many (admittedly unserious) people are so uncomfortable with diagonalization arguments that they dispute that $|\mathbb R| \gt |\mathbb N|$. But even if you are uncomfortable with how we defined $D$, it has to exist! There has to be a language with the words that we picked, and if it's a language, it has to show up in somewhere in our bijection.

If you're familiar with the original diagonalization argument, another way to think about this is that the set of finite strings corresponds to the set of natural numbers. As we argued before, the strings over an alphabet of size $k$ can be thought of as encodings of numbers in base $k$. But what about languages?

There are a number of representations for real numbers, but the one we're interested in is the real number as an infinite sequence of digits. So if we imagine the real number $0.437$, there's really an infinite sequence of 0's following it. But this is the base-10 representation. If we go to base 2, then we have infinite binary sequences.

We can argue that languages, or binary functions over strings, correspond to infinite binary sequences in the following way. Fix an alphabet $\Sigma$. We've said already that strings over $\Sigma$ are base $k$ representations of a natural number. So there's a string for $n$. We can define an infinite binary sequence for a language $L$ by choosing the $n$th digit to be 1 if the string for $n$ is in $L$ and 0 otherwise. So this infinite binary sequence encodes our language.

We can now use this fact to show that there are languages that are not recognizable. In other words, there are languages that have no Turing machine that recognizes them.

Some languages are not Turing-recognizable.

First, we need to show that the set of Turing machines is countable. Recall that the set of all string $\Sigma^*$ is countable. Then we observe that every Turing machine has a string encoding $\langle M \rangle$, for some fixed encoding. Now, not every string corresponds to a proper encoding of a Turing machine, but we can ignore or skip all of those. Thus, there are countably many Turing machines.

Now, suppose that every language is recognizable. Then there is a Turing machine that recognizes each of these languages. We can define a mapping $f : \Sigma^* \to 2^{\Sigma^*}$ that assigns each Turing machine to the language it recognizes. However, we have showed that the set of all languages is uncountable. Thus, $f$ cannot be a bijection, and in fact, there are strictly more languages than there are Turing machines. This means there must exist some language that is not recognized by a Turing machine.

This might be a bit surprising, but it is also not quite satisfying. After all, who cares if there's some crazy problem out there that's not solvable? What if we only care about "real" problems?

Just to refresh our memory, $A_{TM}$ is defined $$ A_{TM} = \{ \langle M,w \rangle \mid \text{$M$ is a TM and $M$ accepts $w$} \} $$ and we are ready to prove the following result.

$A_{TM}$ is undecidable.

We assume that $A_{TM}$ is decidable and end up showing that it isn't. Suppose we have a TM $H$ that decides $A_{TM}$. Then on $\langle M,w \rangle$, $H$ halts and accepts if $M$ accepts $w$ and it rejects if $M$ does not accept $w$.

We construct a new TM $D$ that takes as input the description of a Turing machine and operates as follows:

  1. On input $\langle M \rangle$, where $M$ is a TM, run $H$ on input $\langle M, \langle M \rangle \rangle$.
  2. If $H$ accepts, then reject; if $H$ rejects, then accept.

Let's consider the result of running each Turing machine on a string of encodings of Turing machines. Essentially, we have \[L(D) = \{\langle M \rangle \in \Sigma^* \mid \langle M \rangle \not \in L(M)\}.\]

Again, we can view this as a table. Suppose we have an enumeration of Turing machines $M_1, M_2, \dots$. Then, we have the list of Turing machine encodings running down and whether the $i$th machine accepts it running across. Now, let's consider the result of $D$ when run on a description of one of these machines: it is basically the opposite of whatever runs along the diagonal.

$M_1$ $M_2$ $M_3$ $M_4$ $M_5$ $\cdots$
$\langle M_1 \rangle$ ✔️️ ✔️️ ✔️️
$\langle M_2 \rangle$ ✔️️ ✔️️
$\langle M_3 \rangle$ ✔️️ ✔️️ ✔️️
$\langle M_4 \rangle$ ✔️️
$\langle M_5 \rangle$ ✔️️ ✔️️ ✔️️
$\vdots$

Now, let us consider the result of $D$ when run on the input string $\langle D \rangle$. We have \[\langle D \rangle \in L(D) \iff \langle D \rangle \not \in L(D).\] That is, $D$ must accept if $D$ does not accept $\langle D \rangle$ and it must reject if $D$ accepts $\langle D \rangle$, which is a contradiction. Thus, neither Turing machines $D$ or $H$ can exist and $A_{TM}$ is undecidable.

This is already a pretty big deal. Essentially, this result says that there's no algorithm that will tell us the outcome of a sufficiently complicated function on some arbitrary input. Since it's recognizable, the best we can do is to simulate it.