CMSC 28000 — Lecture 20

Using closure properties

Just like with regular languages, closure properties are a useful tool for showing that certain languages are not context-free. A particularly useful one is intersection with regular languages, because this allows us to "filter" a language to contain only words of a certain form.

Let $L = \{ w \in \{a,b,c\}^* \mid |w|_a = |w|_b = |w|_c \}$. Then $L$ is not context-free.

Suppose $L$ is a context-free language. Then it is closed under intersection with regular languages. However, consider the regular language $L(a^* b^* c^*)$ and $$L \cap L(a^* b^* c^*) = \{a^m b^m c^m \mid m \in \mathbb N \}.$$ This language is known to be not context-free. Therefore $L$ must not be context-free.

But unlike regular languages, context-free languages are not closed under certain useful operations. For instance, let's consider the intersection of two CFLs. Unfortunately, we can't use the cross product construction to build a PDA for the intersection of two CFLs. A similar construction would require keeping track of two different stacks and it is unclear how one would manage this with only one stack. The answer is that you can't, because context-free languages are not closed under intersection.

There exist context-free languages $L_1, L_2 \subseteq \Sigma$ such that $L_1 \cap L_2$ is not a context-free language.

Let $L_1 = \{a^i b^i c^j \mid i,j \geq 0\}$ and let $L_2 = \{a^i b^j c^j \mid i,j \geq 0\}$. It is easy to see that $L_1$ and $L_2$ are both context-free. Consider for $\sigma_1, \sigma_2 \in \Sigma$ the languages \begin{align*} L_{\sigma_1,\sigma_2} = \{\sigma_1^m \sigma_2^m \mid m \geq 0 \}, \\ K_{\sigma_1} = \{\sigma_1^i \mid i \geq 0\}. \end{align*} We have already seen that $L_{\sigma_1,\sigma_2}$ is context-free and $K_{\sigma_1}$ is just $\sigma_1^*$, which is regular (and therefore, context-free). Since context-free languages are closed under concatenation, we have $L_1 = L_{a,b} K_c$ and $L_2 = K_a L_{b,c}$. Thus, $L_1$ and $L_2$ are context-free languages. Then, we have $$L_1 \cap L_2 = \{a^i b^i c^i \mid i \geq 0\}.$$ We have already shown that this language is not context-free.

Thus, the class of context-free languages is not closed under intersection. Note that this basically implies that the context-free languages are not closed under complementation either, via De Morgan's laws—since CFLs are closed under union, if they were also closed under complement, that would give us closure under intersection, which we just showed isn't the case.

Here, we'll give another proof, this time involving languages that we can easily verify are context-free and making use of closure properties.

There exists a context-free languages $L$ such that $\overline L = \Sigma^* \setminus L$ is not a context-free language.

Let $L_1 = \{ a^i b^j c^k \mid i \neq j\} \cup \{a^i b^j c^k \mid j \neq k\}$ and let $L_2 = \overline{L(a^* b^* c^*})$. It is clear that $L_2$ is regular. To see that $L_1$ is context-free, we observe that we can write $$\{a^i b^j c^k \mid i \neq j\} = \left(\{ a^i b^j \mid i < j\} \cup \{a^i b^j \mid i > j\} \right) \{c^k \mid k \geq 0\}$$ and similarly, we have $$\{a^i b^j c^k \mid j \neq k\} = \{a^i \mid i \geq 0\} \left(\{ b^j c^k \mid j < k\} \cup \{b^j c^k \mid j > k\} \right).$$

Then $L = L_1 \cup L_2$ is a context-free language. However, observe that by DeMorgan's laws, we have $$\overline L = \overline{L_1 \cup L_2} = \overline{L_1} \cap \overline{L_2}.$$ First, we observe that $\overline L_2 = L(a^* b^* c^*)$ and therefore words of $\overline L$ must be of this form. Next, we recall that $\overline{L_1}$ consists of words that are not of the form $a^i b^j c^k$ with $i \neq j$ or $j \neq k$. In other words, if words of $L_1$ are of the form $a^i b^j c^k$, we must have $i = j = k$. But this means that $$\overline L = \overline{L_1} \cap \overline{L_2} = \{a^m b^m c^m \mid m \geq 0\}.$$ Of course, we have already seen that this language is not context-free.

Computability

An interesting observation you may have noticed is that while $\{a^n \mid n \geq 0\}$ is accepted by a DFA, the language $\{a^n b^n \mid n \geq 0\}$ is not. And while a PDA can accept $\{a^n b^n \mid n \geq 0\}$, it can't accept $\{a^n b^n c^n \mid n \geq 0\}$. A natural question arises: can we play this game indefinitely? This is one of the things to keep in mind as we begin exploring the limits of computability. Generally speaking, computability theory is about whether a particular problem can be solved or not. Depending on what you've assumed about computers so far, this statement may or may not be obvious to you, but it's another thing to have it proved.

Computability theory is really the beginnings of the entire field of computer science and to see why, we need to go all the way back to the early 1900s. One of the central ideas in mathematical logic that came up at the time is distinguishing the notions of semantic truth from proof. The idea is that by separating the two, we can reduce proof and proof systems to mechanical manipulation of symbols. If we define a proof system so that it matches our notions of semantic truth and load it with the right axioms, then we have a way to express a mathematical statement in, say, first-order logic, and we'd have an algorithm that could mechanically construct a proof of the truth or falsehood of that statement for us.

One of the main drivers of this research was David Hilbert and in 1928, he proposed the Entscheidungsproblem, or decision problem, which asks for such an algorithm.

In order for this to happen, we need a proof system and a set of axioms that satisfies the following properties.

At this point, some of you will be expecting Kurt Gödel to show up and, well, here he is. Gödel proves a bunch of very important results dealing with all of this. First, he proves the Completeness Theorem in 1929, which says that first-order logic is complete and consistent, so if you have a bunch of axioms that don't contradict each other, there's a model for them, which sounds great.

But then he goes on to prove the Incompleteness Theorems in 1931, which says that if you have a consistent set of axioms that can be enumerated by an algorithm, then there's going to be a true statement about the integers that you can't prove, and furthermore, you can't prove that set of axioms is consistent using that set of axioms. This means that your set of axioms/system is incomplete. This is a problem if your goal is to generate proofs for every statement about the integers.

This is just one of the developments that occurred due to this activity around trying to find a way to mechanically prove things. Gödel used what are called $\mu$-recursive functions to try to do this. Alonzo Church defined the $\lambda$-calculus for this purpose. Emil Post defined what are called Post systems, which are a kind of string rewriting system.

And in 1936, Alan Turing defined what we now call Turing machines.

Turing machines

First, we'll try to understand what a Turing machine is informally. Remember that we started out with the finite automaton, considered the simplest model of computation. This is just some finite set of states, or memory, and some transitions. We can do a lot of useful things with FAs but there are many languages that FAs aren't powerful enough to recognize, such as $a^n b^n$. However, we can enhance the power of the machine by adding a stack. This gives us the ability to recognize a larger class of languages.

We can think of a Turing machine in the same way. A Turing machine is just a finite automaton with the two main restrictions we placed on it now lifted. That is, the tape that we placed our input on before which was read-only can now be written, and our tape is infinitely long. Furthermore, we can move however we like on the tape, unlike the pushdown stack.

A Turing machine is a 7-tuple $M = (Q,\Sigma,\Gamma,\delta,q_i,q_A,q_R)$, where

In our model, we have a one-sided infinitely long tape, with the machine starting its computation with the input on the tape beginning at the left end of the tape. The input string is followed by infinitely many blank symbols $\Box$ at the right.

In the course of a computation on a Turing machine, one of three outcomes will occur.

  1. The machine enters the accepting state and accepts and halts.
  2. The machine enters the rejecting state and rejects and halts.
  3. The machine never enters the accepting or rejecting state and does not halt.

For a Turing machine $M$ and an input word $w$, if the computation of $M$ on $w$ enters the accepting state, then we say that $M$ accepts $w$. Likewise, if the computation of $M$ on $w$ enters the rejecting state, then we say $M$ rejects $w$. We call the set of strings that are accepted by $M$ the language of $M$, denoted $L(M)$.

Notice that since we now have three possibilities, we have more notions of what it means for a Turing amchine to accept a language.

A Turing machine $M$ recognizes a language $L$ if every word in $L$ is accepted by $M$ and for every input not in $L$, $M$ either rejects the word or does not halt. A language $L$ is recognizable if some Turing machine recognizes it.

A Turing machine $M$ decides a language $L$ if every word in $L$ is accepted by $M$ and every input not in $L$ is rejected by $M$. A language $L$ is decidable if some Turing machine decides it.

The Church-Turing thesis

As a reminder, the Turing machine was introduced by Turing in 1936, while the finite automaton model was introduced by Kleene in the 50's. It's obvious now that the two are related and it's very helpful to relate the two models, but Turing definitely didn't think of the machines he defined in the same way that we do now. In fact, near the end of Kleene's 1951 technical report, he devotes a paragraph or two discussing the possible similarities between his model and Turing's.

Instead, what Turing attempted to do was formalize how computers worked, by which we mean the 1930s conception of what a computer is. In other words, he tried to formalize how human computers might compute a problem. We can view the finite state system as a set of instructions we might give to a person, that tells them what to do when they read a particular symbol. We allow this person to read and write on some paper and we let them have as much paper as they want.

This gives us an arguably concrete notion of computation. One of the issues with a lot of the formalist work of the 30s was that there was a lot of it and it wasn't entirely clear that these were all really different ways of doing the same thing. We've seen hints of this already: there's no real reason we should've expected, say, certain automata and grammars to be equivalent, but it turned out they are.

So it wasn't obvious that Church's $\lambda$-calculus and Gödel's $\mu$-recursive functions and Post's Post systems were all basically the same. The advantage of the Turing machine is the very clear analogy to human and mechanical computation. This obviousness makes it easy to prove that these other systems were equivalent to Turing machines and led to the Church-Turing thesis.

Nowadays, we think of the Church-Turing thesis in the context of real computers and the notion of Turing-completeness of programming languages and other systems. However, the Church-Turing thesis is really a statement about the equivalence of those formalisms from the 30s. Since Turing machines are now accepted as the standard of computability, Turing-completeness is just the property of a particular system to be able to simulate a Turing machine, and implies that this system is capable of "real" computation.

Most people are introduced to the notion of Turing-completeness when thinking about programming languages and languages similar to programming languages. For instance, it's not surprising that your typical programming language is Turing-complete. Where things get more interesting is when dealing with those languages that we don't typically think of as "programming" languages, like $\TeX$ or C++ template metaprogramming. Of course, this has important theoretical uses, like showing that theoretical molecular or quantum models of computation can at least achieve the standard notion of computability. This also has less important theoretical uses, like showing that Magic: The Gathering or PowerPoint transitions also have the full power of computability.

The simplicity of the definition of Turing machine makes it fairly robust. Just like with finite automata and pushdown automata, we can ask if we can modify the machine and how it changes its computational power. There happens to be very little that you can do to increase the computational power of a Turing machine: multiple tracks, multiple tapes with independent tape heads, doubly-infinite tapes, even nondeterminism.

The other important idea that the definition of the Turing machine gives us is a concrete way to think about resources. Our basic notions from complexity theory are derived here: time is just how many steps a Turing machine makes and space is just how many tape cells a Turing machine uses.

We'll define a Turing machine for the language $L = \{a^n b^n c^n \mid n \geq 0\}$.

The machine, when formally defined looks like this:

Note that here, $q_7$ acts as the accepting state, and that the rejecting state is nowhere to be seen. This really should've been included explicitly, but since this language is known to be decidable, we can imagine that all of the undefined transitions go to the rejecting state. Again, this is not always going to be true. I should probably replace this diagram with one that includes an explicit reject state.

Here, we rely on the idea that since our input alphabet is finite, we can just include "marked" variants of each symbol in our tape alphabet. The more obvious thing that we see is that even for a relatively simple language, specifying a Turing machine with even this amount of detail can get cumberson. This will be the only time we see a Turing machine completely defined. Instead, we opt for algorithmic descriptions and as we get further along, we will conveniently drop more and more details.

A less formal description of the operation of the machine can be described as follows:

  1. Read an $a$ and mark it.
  2. Go right across the word until we find a $b$ and mark it.
  3. Go right across the word until we find a $c$ and mark it.
  4. Go left until we find a marked $a$. Then move right to the unmarked $a$.
  5. Repeat this process from step 1 until we have marked the same number of $c$'s, $b$'s, and $a$'s.
  6. If there aren't enough $b$'s or $c$'s to mark, then the machine moves to the reject state and rejects the word.
  7. Otherwise, we have marked an equal number of $a$'s, $b$'s, and $c$'s and the tape head is on the first marked $b$. Go to the right, ensuring that we are only reading marked $b$'s.
  8. Now, we are on a marked $c$. Go to the right, ensuring that we are only reading marked $c$'s.
  9. If we encounter an unmarked symbol, the machine moves into the reject state. Otherwise, we should read a $\Box$ and accept.