CMSC 28000 — Lecture 18

Now, we'll consider the other direction. We won't go through all the details this time.

Let $M$ be a pushdown automaton. Then there exists a context-free grammar $G$ such that $L(G) = N(M)$.

Let $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0)$ be a PDA that accepts by empty stack. We want to simulate the process of performing a computation on $M$ via a grammar $G$. Suppose we have an accepting computation on a string and consider one step of such a computation $$(q,au,X) \vdash (q',u,Y_1 \cdots Y_k)$$ where $q,q' \in Q$, $u \in \Sigma^*$, $a \in \Sigma \cup \{\varepsilon\}$, and $X, Y_1, \dots, Y_K \in \Gamma$.

In this computation, we follow the transition $(q',Y_1 \cdots Y_k) \in \delta(q,a,X)$. Note that $a \in \Sigma \cup \{\varepsilon\}$. Because the PDA has to empty its stack in order to accept this string, the computation proceeds and at some point we end up in a configuration $(q_1,u_1,Y_2 \cdots Y_k)$. Afterwards, we continue the computation, removing each $Y_i$ and consuming the input word until we end up in a configuration $(r,\varepsilon,\varepsilon)$ for some state $r \in Q$.

We will construct a CFG $G = (V,\Gamma,P,S)$ that generates $N(M)$. First, we define the set of variables $V$ by $$ V = \{S\} \cup \{A_{q,X,r} \mid q,r \in Q, X \in \Gamma\}.$$

Much like the DFA to regular expression proof, we try to solve the slightly more general problem of describing all strings between every pair of states in our machine. Here, we define a variable based on the pair of states $q,r \in Q$. But in the PDA, we also have another object driving the computation: the stack. So we include the item $X$ that is at the top of the stack to be popped.

This results in the variable $A_{q,X,r}$. We want that $A_{q,X,r} \Rightarrow^* w$ for a word $w$ if and only if $(q,w,X) \vdash^* (r,\varepsilon,\varepsilon)$. That is, $A_{q,X,r}$ generates all strings $w$ such that there is a computation on input $w$ from $q$ to $r$ with $X$ initially at the top of the stack.

We then need to define our set of productions appropriately. Consider a transition $(r_0,Y_1 \cdots Y_k) \in \delta(q,a,X)$ as above. There are two cases.

Be careful of the subscripts here: note that though we have a transition to the state $r_0$, the variable we are defining is $A_{q,X,r_k}$. Why do we define the variable this way?

Recall that we had the transition $(r_0, Y_1 \cdots Y_k) \in \delta(q,a,X)$. So what we try to represent is the action of reading a symbol $a$, popping $X$ from the stack, pushing $Y_1 \cdots Y_k$ on to the stack, and moving to state $r_0$.

However, all of those things we pushed onto the stack need to get removed over the course of the computation. So they will all be involved in generating some part of our input string. But what would the state of this machine actually look like after, say, removing $Y_1$?

Recall that we've just performed the computation step \[(q,au,X) \vdash (r_0, u, Y_1 Y_2 \cdots Y_k).\] If we proceed from this configuration, we pop $Y_1$, but this does not leave us with a stack with $Y_2$ on the top. The transition on $Y_1$ could leave us with even more stuff on the stack, like \[(r_0, u, Y_1 Y_2 \cdots Y_k) \vdash (r_0', u', Z_1 Z_2 \cdots Z_\ell Y_2 \cdots Y_k).\] But because we know we have to clear out the stack, we know that $Y_2$ must return to the top of the stack at some point. How much of the input string have we read at that point? What is the state that the PDA is in at that point? Those are things that we don't know, but have to guess: \[(r_0, u, Y_1 Y_2 \cdots Y_k) \vdash^* (r_1, u', Y_2 \cdots Y_k).\]

In essence, we should think of the variable $A_{q,X,r}$ primarily as being about clearing out $X$ off of the stack and the possible strings that get consumed by the PDA (and hence generated by our grammar). The states at which it begins and ends this process is a detail that we fill in after the fact by considering all possible sequences of states $r_1, \dots, r_k$ such that we can pop $Y_1 \cdots Y_k$.

There is a question of whether we can actually do this thing of taking all the possible sequences of states, since it seems like there could be an infinite number of such sequences. However, note that each of these production is based on a transition and the number of stack symbols that transition pushes onto the stack. Since this number is fixed, we only need to consider all of these combinations: if there are $n$ states in the PDA and our transition pushes $k$ stack symbols on to the stack, that's only $n^k$ different productions that we need to add. Not all of the variables will be useful, and those that are will be the ones that admit a valid derivation.

Since we want to generate all words such that the PDA $M$ reaches a configuration with an empty stack, we add transitions for all states $q \in Q$ and the start state $q_0$ $$S \to A_{q_0,Z_0,q}.$$ From this, we have $(q_0,w,Z_0) \vdash^* (q,\varepsilon,\varepsilon)$ if and only if $S \Rightarrow A_{q_0,Z_0,q} \Rightarrow^* w$.

A side note on determinism

At this point, we might want to play around with our model in the same way that we did with our finite automaton. One observation that we've made so far is that we've begun with a nondeterministic machine model for the PDA. An obvious question is what happens when we try to restrict the machine to be deterministic. The next question is how to define this.

A deterministic pushdown automaton (DPDA) is a PDA $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0,F)$ with the additional conditions:

The class of languages accepted by DPDAs by final state are called the deterministic context-free languages.

It's clear from this definition that a PDA can recognize any language accepted by a DPDA. But note that a DPDA is not simply a PDA with the codomain of the transition function restricted to $Q \times \Gamma^*$—we are allowed both to have undefined transitions (i.e. transitions that end up in $\emptyset$) as well as $\varepsilon$-transitions.

We should parse out what these restrictions imply. The first condition restricts the choices of a transition for any particular state, input symbol, and stack symbol. In this case, there is no ambiguity, since the transition is uniquely determined by the state, input symbol, and stack symbol.

The second condition concerns $\varepsilon$-transitions. We allow $\varepsilon$-transitions because it is possible that we want to take a transition based on the symbol on the top of the stack. In this case, if we must have an $\varepsilon$-transition, we disallow any other transitions on the same stack symbol. Without this condition, we could sneak in some nondeterminism by making a choice between consuming an input symbol or not.

Here is a PDA.

You might recognize it. This is the PDA for $\{a^n b^n \mid n \geq 0\}$. However, it is not a DPDA, since $\delta(q_0,\varepsilon,Z_0) \neq \emptyset$ but $\delta(q_0,a,Z_0) \neq \emptyset$. However, this is easy to fix.

Now, we have a DPDA. A natural question to ask is, just as with NFAs, is there a way to determinize every PDA into a DPDA?

The DPDA model is due to Seymour Ginsburg and Sheila Greibach in 1966. This puts the development of the DPDA a few years after the standard PDA model. Unlike with finite automata, there are some significant differences between DCFLs and CFLs.

Here is one quick distinction. Recall our PDA construction that computed a leftmost derivation for any given CFG $G$ as part of our proof that PDAs and CFGs are equivalent. If we could construct an equivalent DPDA, this would suggest a deterministic way of computing a leftmost derivation for any word under any CFG. But the existence of inherently ambiguous languages suggests otherwise.

One important property of DCFLs that they show is that DCFLs are closed under complementation. We've hinted that this is not the case for context-free languages in general. This is another property that tells us the classes of DCFLs and CFLs are different.

So what is the point of DCFLs if they're not as powerful as CFLs? As with many things in computer science, we can talk about tradeoffs. Recall that parsing with a context-free grammar can be done in cubic time via CYK and not significantly better, due to the matrix multiplication lower bound. Along with CYK, we've also seen an example of using a PDA to parse a grammar, when we showed that CFGs were equivalent to PDAs. In that case, we used a top-down method, starting with the start symbol $S$ and nondeterministically choosing productions to construct a leftmost derivation.

This is where DCFLs come in. DCFLs are a class of languages that have associated grammars that can be parsed in linear time. These are the LR(k) grammars, due to Donald Knuth in 1965.

The name here gives away our strategy. In $\mathrm{LR}(k)$, L stands for left-to-right scan, R stands for rightmost derivation, and $k$ is the number of characters we're allowed to "look ahead" in order to decide what to do. Grammars with more lookahead use the same ideas and can be parsed in a similar way.

Knuth gives the following pair of results which links $\mathrm{LR}(k)$ grammars and DCFLs.

If $G$ is an $\mathrm{LR}(k)$ grammar, then there is a deterministic pushdown automaton that accepts $L(G)$. If $L$ is a deterministic context-free language, then there is an $\mathrm{LR}(1)$ grammar that generates $L$.

Practically speaking, most programming languages are really languages that are $\mathrm{LR}(1)$ or so. YACC generates grammars that are $\mathrm{LALR}$, which is somewhere in between $\mathrm{LR}(0)$ and $\mathrm{LR}(1)$.