CMSC 28000 — Lecture 21

Decidability

Recall that we have the two notions of decidable and recognizable languages. These form language classes.

The class of (Turing-)recognizable languages is the class of languages that can be recognized by a Turing machine. This class is also called the class of recursively enumerable languages.

The class of (Turing-)decidable languages is the class of languages that can be decided by a Turing machine. This class is also called the class of recursive languages.

As you might guess, the alternate terminology (which is still used) is due to equivalent notions in the world of recursive functions.

Now, we're finally able to return to the idea of decision problems that we started the class off with. Recall that a decision problem is just a function that maps an input to Yes or No. Then, we observe that everything that we do on real computers and Turing machines can be interpreted as a string of bits. We now understand that these are just words, which makes our decision problem a language membership problem: is this word in the language or not?

With this, we'll be revisiting some properties of regular and context-free languages. While we'll see that a lot of problems are decidable, there are some very surprising examples of problems which are undecidable. Many decidability properties that we'll be discussing were shown by Rabin and Scott in 1959 and Bar-Hillel, Perles, and Shamir in 1961.

First, it's worth convincing ourselves that it's possible to treat all inputs as strings. The argument is as follows: A DFA is just a list of stuff—states, transitions, and so on. We can convince ourselves in this way that it's perfectly fine to encode lists of things. So we have a way to encode a DFA. This argument leads us to encoding things like grammars, PDAs, even Turing machines. Anything that has a finite description is fair game. And why not? We implicitly do this every time we write a program anyway.

For some object $O$, we denote $\langle O \rangle$ as the encoding of $O$ as a string over some given alphabet. Then $\langle x,y \rangle$ is the encoding of two objects $x$ and $y$ as a single string, such that we can distinguish $x$ and $y$.

Next, we'll lay out the problems we're interested in:

Decision problems on formal languages

Membership

The simplest problem we can ask about a language device and a word is whether the word belongs in the language. Again, this is the language membership problem: Given a device $A$ and a word $w$, is $w \in L(A)$?

We define the following language for the DFA acceptance problem: $$ A_{DFA} = \{\langle B, w \rangle \mid \text{$B$ is a DFA that accepts $w$}\}.$$

$A_{DFA}$ is a decidable language.

The idea of the proof is rather simple: we simply simulate $B$ on the input string $w$. How does would a Turing machine do this? The machine would need to keep track of two things:

  1. The current state of $B$, and
  2. The current location in the word $w$.

Then the machine accepts the input if and only if $B$ is in an accepting state when the last symbol of $w$ is processed. Otherwise, it rejects.

We can ask the same question for NFAs: $$ A_{NFA} = \{\langle B, w \rangle \mid \text{$B$ is an NFA that accepts $w$}\}.$$

$A_{NFA}$ is a decidable language.

Here, we can take advantage of the theorem we just proved above. We can construct a Turing machine that does the following on the input $\langle B,w \rangle$, where $B$ is an NFA and $w$ is an input word:

  1. Convert $B$ into a DFA $B'$ via subset construction.
  2. Run the Turing machine from Theorem 21.3 on the input $\langle B', w \rangle$ to check if $w \in L(B')$.
  3. If the TM accepts, accept. Otherwise, reject.

This machine demonstrates two ideas. The first is that we can use other Turing machines as subroutines like a typical programmer would think of doing. The second is that we are able to convert NFAs to DFAs, so we don't need to consider DFAs and NFAs separately anymore.

Now, we'll turn our attention to context-free languages. As you might expect, things can get a bit more complicated, especially as grammars are fairly different objects from finite automata and we don't have many of the same nice properties of regular languages to rely on. In the case of a grammar, rather than acceptance, we have the problem of whether a grammar generates a particular string. Thus, we have the following language $$ A_{CFG} = \{\langle G, w \rangle \mid \text{$G$ is a CFG that generates $w$}\}.$$

$A_{CFG}$ is a decidable language.

This is obviously decidable; we even gave an efficient algorithm for it (CYK)! So we can pretend that we have a CYK Turing machine.

If you don't like that answer, we can do the more naïve way and check every derivation with $2n-1$ steps. Such a Turing machine operates as follows:

  1. On input $\langle G,w \rangle$, where $G$ is a CFG and $w$ is a string, transform $G$ into an equivalent grammar $G'$ in Chomsky normal form.
  2. List all derivations with $2n-1$ steps, where $n = |w|$, unless $n = 0$, in which case list all derivations with 1 step.
  3. If any of these derivations generate $w$, accept; otherwise, reject.

Emptiness

Another common problem you can ask is whether your language device actually describes any strings at all. This is the emptiness problem: Given a device $A$, is $L(A) = \emptyset$?

We will consider the following language $$ E_{DFA} = \{ \langle A \rangle \mid \text{$A$ is a DFA and $L(A) = \emptyset$}\}. $$

$E_{DFA}$ is a decidable language.

To solve this problem, we simply treat it like a graph problem. A DFA accepts a word if there is an accepting state that is reachable from the initial state. Then we can use a similar idea to the graph connectedness algorithm from last week for this problem. The TM looks like this:

  1. Mark the initial state of $A$.
  2. Repeat until no new states get marked:
    Mark any state that has a transition coming into it from any state that is already marked.
  3. If no accepting state is marked, accept. Otherwise, reject.

We can also show that the emptiness problem for CFGs is also decidable. Again, it'll look a bit different from what we've done with DFAs. First, here's the language $$E_{CFG} = \{\langle G \rangle \mid \text{$G$ is a CFG and $L(G) = \emptyset$} \}.$$

$E_{CFG}$ is a decidable language.

In a way, we can apply the same kind of idea from the DFA case. Of course, a grammar isn't a graph, so it's not exactly the same. However, in essence, what we want to do is check that there's a derivation that will generate some terminal beginning from the start variable. To do this, we check each variable to see if it can generate a string of terminals. We begin by marking terminals and determining whether each variable can generate a string of variables and terminals that are all marked. Once the algorithm is finished, we can simply check whether the start variable is marked or not.

  1. On input $\langle G \rangle$ where $G$ is a CFG, mark all terminal symbols in $G$.
  2. Repeat until no new variables get marked:
    Mark any variable $A$ where $G$ has a rule $A \to U_1 U_2 \cdots U_k$ and each symbol $U_1, \dots, U_k$ has already been marked.
  3. If the start variable is not marked, accept; otherwise reject.

Containment

Another common problem is if we have two language devices, whether we can compare them somehow. The most basic variant of these is asking whether one is contained in the other. This is the containment problem: Given two devices $A$ and $B$, is $L(A) \subseteq L(B)$?

Consider the following language, $$ C_{DFA} = \{ \langle A,B \rangle \mid \text{$A$ and $B$ are DFAs and $L(A) \subseteq L(B)$} \}.$$ Of course, this is another decidable property, so we have the following theorem.

$C_{DFA}$ is a decidable language.

Again, we make use of the machine that we constructed in the previous theorem. First, let's consider what it means for $L(A)$ to be contained in $L(B)$. We want to check whether every word in $L(A)$ is also in $L(B)$. However, an equivalent way of checking this property is to check whether there are any words in $L(A)$ that are not in $L(B)$. So we construct a DFA $C$ such that $$ L(C) = L(A) \cap \overline{L(B)}$$ and check whether or not $L(C) = \emptyset$. Luckily, we've just shown how to check the emptiness of a DFA.

  1. On input $\langle A,B \rangle$, construct the DFA $C$ as described above.
  2. Run the TM from Theorem 21.6 on the input $\langle C \rangle$ to check if $L(C) = \emptyset$.
  3. If the machine accepts, accept. Otherwise, reject.

Solving containment lets us solve other similar problems. For instance, we can solve the equality problem: Given two devices $A$ and $B$, is $L(A) = L(B)$?

The language $$EQ_{DFA} = \{ \langle A,B \rangle \mid \text{$A$ and $B$ are DFAs and $L(A) = L(B)$} \}.$$ is a decidable language.

Of course, we have the opportunity to make use of the machine that we constructed in the previous theorem. Recall that for two sets $S$ and $T$, $S = T$ if and only if $S \subseteq T$ and $T \subseteq S$. This fact combined with the containment machine we constructed gives us a rather simple algorithm for checking equality.

  1. On input $\langle A,B \rangle$, run the TM from the previous theorem on the input $\langle A,B \rangle$ to check if $L(A) \subseteq L(B)$ and run it again on $\langle B,A \rangle$ to check if $L(B) \subseteq L(A)$.
  2. If both machines accept, accept. Otherwise, reject.

Then this allows us to solve the universality problem: Given a device $A$, does $L(A) = \Sigma^*$? Again, this is fairly straightforward based on the results we've already shown. One approach is to construct a simple DFA that recognizes $\Sigma^*$ and test whether $\Sigma^* \subseteq L(A)$. Another approach is the check whether $\overline{L(A)} = \emptyset$.

But what about context-free languages? Suppose we have two context-free grammars $G_1$ and $G_2$. Then $L(G_1) \subseteq L(G_2)$ if and only if $L(G_1) \cap \overline{L(G_2)} = \emptyset$, just as above. But there's a problem with this: context-free languages aren't closed under intersection and complement, so we're not guaranteed to end up with a context-free language after this process.

Maybe we can try something more clever? Unfortunately, even if we tried harder, we won't be able to solve this problem, as it turns out to be our first encounter with an undecidable problem. And since containment for CFGs is undecidable, this means that equality and universality are also going to be undecidable.

Well, that's the claim at least, but can we prove it?

First, as an aside, this tells us that some surprisingly fundamental problems about context-free languages are undecidable. This leads to an interesting question: Are there any known undecidable problems about finite automata? The answer is, there are very few examples, but yes!

A problem that is not decidable is said to be undecidable. The notion of undecidable problems poses a problem in practical terms (and philosophically as well), since it amounts to saying that there is no algorithm that can solve this problem.

For one thing, it is not clear that such a thing can even happen. After all, maybe these problems are just really difficult and we haven't figured out a way to solve them. This certainly seems like the case for something like context-free language equality. For example, the question of whether equality for deterministic CFLs was posed by Ginsburg and Greibach when they introduced DCFLs in 1966 and this remained open until 1997 when Géraud Sénizergues proved that it was decidable (this won him the Gödel Prize in 2002).

However, we will prove that, in fact, there are problems out there that have no algorithm that will solve them. We will do this by first showing that the following language is undecidable. \[A_{TM} = \{ \langle M,w \rangle \mid \text{$M$ is a Turing machine and $M$ accepts $w$} \}\]

This is the membership problem for Turing machines. This looks like the decision problems we've seen before: of course, there is a way to encode Turing machines as strings, by using the same methods we've used for encoding DFAs and CFGs.

It's important to note here that while $A_{TM}$ is not decidable, we can show that it is recognizable.

$A_{TM}$ is recognizable.

This is not actually completely obvious. What this says is that there is a Turing machine out there that can take an encoding of a Turing machine and some input, and simulate the encoded Turing machine on the desired input word. Such a Turing machine is called a universal Turing machine. There are a few interesting observations we can make.

We've studied a number of problems where we're essentially simulating other models of computation, but the idea of a universal Turing machine acts very much like our real general-purpose computers do. And each of our computers has various differences, but via encodings and the power of being able to decode these encodings allows us to simply write programs and have them work on every computer. Some people liken this to Turing essentially inventing software.

So there's a very clear analogy here to writing programs. Just as we can treat the programs we write as text files, we can treat the Turing machines as strings. But there is another analogy that many of you will be familiar with if you've taken CS 151 or 161: functions as data. In functional programming, we are used to the idea of treating functions as essentially another kind of value that can be passed around. The idea behind encoding Turing machines as strings is no different.

To prove that $A_{TM}$ is recognizable, we'll describe how a universal Turing machine could work.

We will describe a universal Turing machine $U$. On input $\langle M,w \rangle$, where $M$ is a Turing machine and $w$ is a string, $U$ operates as follows:

  1. Simulate $M$ on input $w$.
  2. If $M$ enters the accept state, accept; if $M$ ever enters the reject state, reject.

Note that because $U$ simulates another Turing machine, $U$ will loop forever if $M$ never enters the acecpt or reject state. Of course, this is a rather unsatisfying description of a very important concept, so let's take a closer look at the implementation.

Our universal Turing machine uses three tapes:

  1. A tape containing the description of the Turing machine $M$ to be simulated.
  2. A simulation of the work tape of $M$.
  3. The current state of $M$.

From this, it should not be difficult to see how the machine works. The simulation of the work tape is straightforward, so it remains to see how the simulation of the finite state portion of $M$ is simulated. This is taken care of by scanning the description of $M$ in the first tape. Since the current state is kept track of on the third tape, the machine only needs to scan the transitions in the description of $M$ on the first tape to determine which transition to take.