CMSC 28000 — Lecture 16

Equivalence of CFGs and PDAs

Just as regular expressions and finite automata happened to have a correspondence, so do context-free grammars and pushdown automata. For proofs in both directions, we'll take advantage of the fact that PDAs that accept via final state are equivalent to those that accept by empty stack. We will show that CFGs can be converted to PDAs that accept via empty stack and vice-versa. Since we already showed that these are equivalent with PDAs that accept via final state, we effectively show that these are equivalent to CFGs too.

Let $G$ be a context-free grammar. Then there exists a pushdown automaton $M$ such that $N(M) = L(G)$.

Let $G = (V,\Sigma,P,S)$ be a CFG. We will create a PDA $M = (Q,\Sigma,\Gamma, \delta, q_0, Z_0)$ which accepts by empty stack defined as follows:

and we define the following transitions

Here is what the machine looks like.

This machine essentially acts as a parser for $G$, albeit a nondeterministic one. And unlike many other machines we've seen before, the computation in this machine is driven primarily by the stack:

This process essentially performs a leftmost derivation of our input word. It's important to note that we can elide many of the tricky details of how we can ensure the correct production rule is chosen when we see $A$ through the magic of nondeterminism.

To formally prove this, we need to show that we have a derivation $S \Rightarrow^* w$ over $G$ if and only if there is a computation $(q,w,S) \vdash^* (q,\varepsilon,\varepsilon)$ in $M$. The details follow.

We will first show that $L(G) \subseteq N(M)$. Let $w \in L(G)$. There exists a leftmost derivation for $w$ $$S = \gamma_1 \Rightarrow \gamma_2 \Rightarrow \cdots \Rightarrow \gamma_n = w,$$ where $\gamma_i \in (V \cup \Sigma)^*$ for $1 \leq i \leq n$. Since this is a leftmost derivation, we can write $\gamma_i = x_i \alpha_i$, where $x_i \in \Sigma^*$ and $\alpha_i \in V(V \cup \Sigma)^*$ for $1 \leq i \leq n-1$. In other words, $\alpha_i$ begins with the leftmost variable of $\gamma_i$. Furthermore, $x_i$ must be a prefix of $w$ and for each $x_i$, we can write $w = x_i y_i$ for some $y_i \in \Sigma^*$.

We will show by induction on $i$, that $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ on the PDA $M$ for all $1 \leq i \leq n$.

For our base case, we take $i = 1$. We have $x_1 = \varepsilon$, $y_1 = w$, and $\alpha_1 = S$. Thus, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ trivially.

For our inductive hypothesis, we assume that $(q,w,S) \vdash^* (q,y_j,\alpha_j)$ for all $j < i$. We can apply this to get $(q,w,S) \vdash^* (q,y_{i-1},\alpha_{i-1})$, where $x_{i-1} y_{i-1}$, $\alpha_{i-1} = A_{i-1} \eta_{i-1}$ for some $\eta_{i-1} \in (V \cup \Sigma)^*$, and $\gamma_{i-1} = x_{i-1} \alpha_{i-1}$.

Since we are considering the leftmost derivation, there must exist a production $A_{i-1} \to \beta_{i-1}$ in $G$ for some $\beta_{i-1} \in (V \cup \Sigma)^*$. By our definition of $M$, there is a transition with $(q,\beta_{i-1}) \in \delta(q,\varepsilon,A_{i-1})$. Therefore, we have $(q,y_{i-1},\alpha_{i-1}) \vdash (q,y_{i-1},\beta_{i-1}\eta_{i-1})$.

Now, write $y_{i-1} = y_{i-1}' y_i$ and $\beta_{i-1} \eta_{i-1} = y_{i-1}' \alpha_i$, where $y_{i-1}' \in \Sigma^*$. We can read $y_{i-1}'$ via transitions $\delta(q,a,a) = \{(q,\varepsilon)\}$ for all $a \in \Sigma$ to give us $(q,w,S) \vdash^* (q,y_i,\alpha_i)$. Thus, we have shown that for each $i$, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$. Then taking $i = n$, we have $$(q,w,S) \vdash^* (q,y_n,\alpha_n) = (q,\varepsilon,\varepsilon)$$ and thus $w \in N(M)$ as desired.

Next, we will show that $N(M) \subseteq L(G)$. Let $w \in N(M)$. We will show that for any variable $A \in V$, if $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$, then $A \Rightarrow^* w$. We will show this by induction on $n$, then length of an accepting computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$.

For our base case, we have $n = 1$, since $A \neq \varepsilon$. Since $A$ is at the top of the stack, we must choose a transition of the form $\delta(q,\varepsilon,A)$. However, this means that $w = \varepsilon$ and $(q,\varepsilon) \in \delta(q,\varepsilon,A)$ and this implies that there is a rule $A \to \varepsilon$ in $G$. Thus, we have $A \Rightarrow^* w$.

Now, suppose that $n > 1$ and for our inductive hypothesis, we assume that for all variables $A \in V$, if the length of a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of $M$ is fewer than $n$ steps, then $A \Rightarrow^* w$. Consider a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of length $n$.

Again, we must first take a transition $\delta(q,\varepsilon,A)$ since $A$ is on the top of the stack. This time, we take the transition corresponding to a rule of the form $A \to \alpha_1 \cdots \alpha_k$, where $\alpha_1, \dots, \alpha_k \in (V \cup \Sigma)$, and we have $(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k)$. From here, consider the following computation, $$(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$$ where $w'$ is a suffix of $w$ with $w = w_1 w'$.

In other words, we have $(q,w_1 w',\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$. Now, we make use of the following observation: the computation $$(q,w_1,\alpha_1 \cdots \alpha_k) \vdash^* (q,\varepsilon,\alpha_2 \cdots \alpha_k)$$ is also a valid computation in $M$. The reasoning is simple: the computation clearly does not depend on $w'$, since none of $w'$ was read. If our input word were simply $w_1$, we would have exactly the above computation. However, this leads us to another similarly useful observation we can make: the computation $$(q,w_1,\alpha_1) \vdash^* (q,\varepsilon,\varepsilon)$$ is also a valid computation in $M$. The reasoning is similar: if we began our computation of $w_1$ with only $\alpha_1$ on the stack, we would arrive at the same configuration since $\alpha_2 \cdots \alpha_k$ were inconsequential to the computation.

Now, we observe that we have one of two possibilities. If $\alpha_1 \in \Sigma$, then $(q,w_1,\alpha_1) \vdash (q,\varepsilon,\varepsilon)$ implies that $w_1 = \alpha_1$ and we have $\alpha_1 \Rightarrow^* w_1$ trivially. Otherwise, we have $\alpha_1 \in V$ and since $(q,w_1,\alpha_1) \vdash (q,\varepsilon,\varepsilon)$ is a computation of length less than $n$, we have $\alpha_1 \Rightarrow^* w_1$ by the inductive hypothesis.

We can now return to the computation $$(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$$ and observe that we have $$(q,w,A) \vdash (q,w_i w',\alpha_i \cdots \alpha_k) \vdash^* (q,w',\alpha_{i-1} \cdots \alpha_k)$$ for every $1 \leq i \leq k$ and we can apply a similar argument to obtain derivations $\alpha_i \Rightarrow^* w_i$ for $w = w_1 \cdots w_k$. This gives us our derivation, $$A \Rightarrow \alpha_1 \cdots \alpha_k \Rightarrow^* w_1 \alpha_2 \cdots \alpha_k \Rightarrow^* w_1 w_2 \alpha_3 \cdots \alpha_k \Rightarrow^* w_1 \cdots w_k = w$$ and thus $A \Rightarrow^* w$. Taking $A = S$, we have $S \Rightarrow^* w$ and $w \in L(G)$ as desired.

Consider the grammar for palindromes, $$S \rightarrow aSa \mid bSb \mid a \mid b \mid \varepsilon,$$ and consider a PDA $P$ for this grammar, which we obtain by following the construction of Theorem 16.1. Then a computation of $P$ on the word $aababaa$ looks like \begin{align*} (q,aababaa,S) & \vdash (q,aababaa,aSa) \\ & \vdash (q,ababaa,Sa) \\ & \vdash (q,ababaa,aSaa) \\ & \vdash (q,babaa,Saa) \\ & \vdash (q,babaa,bSbaa) \\ & \vdash (q,abaa,Sbaa) \\ & \vdash (q,abaa,abaa) \\ & \vdash (q,baa,baa) \\ & \vdash (q,aa,aa) \\ & \vdash (q,a,a) \\ & \vdash (q,\varepsilon, \varepsilon) \end{align*}

Now, we'll consider the other direction. We won't go through all the details this time.

Let $M$ be a pushdown automaton. Then there exists a context-free grammar $G$ such that $L(G) = N(M)$.

Let $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0)$ be a PDA that accepts by empty stack. We want to simulate the process of performing a computation on $M$ via a grammar $G$. Consider the first step of such a computation $$(q,au,X) \vdash (q',u,Y_1 \cdots Y_k)$$ where $q,q' \in Q$, $u \in \Sigma^*$, $a \in \Sigma \cup \{\varepsilon\}$, and $X, Y_1, \dots, Y_K \in \Gamma$.

In this computation, we follow the transition $(q',Y_1 \cdots Y_k) \in \delta(q,a,X)$. Note that $a \in \Sigma \cup \{\varepsilon\}$. Then the computation proceeds and at some point we end up in a configuration $(q_1,u_1,Y_2 \cdots Y_k)$. We continue the computation, removing each $Y_i$ and consuming the input word until we end up in a configuration $(r,\varepsilon,\varepsilon)$ for some state $r \in Q$.

We will construct a CFG $G = (V,\Gamma,P,S)$ that generates $N(M)$. First, we define the set of variables $V$ by $$ V = \{S\} \cup \{A_{q,X,r} \mid q,r \in Q, X \in \Gamma\}.$$ For clarity, we will write $[q,X,r]$ for $A_{q,X,r}$ in the rest of the proof.

The idea is that from $[q,X,r]$, one can construct a derivation $[q,X,r] \Rightarrow^* w$ for a word $w$ if and only if $(q,w,X) \vdash^* (r,\varepsilon,\varepsilon)$. In a sense, the variable $[q,X,r]$ represents the computation from $q$ to $r$ on input $w$ with $X$ on the stack. We then need to define our set of productions appropriately.

Consider a transition $(r,Y_1 \cdots Y_k) \in \delta(q,a,X)$ as above. If what we have is $(r,\varepsilon)$, then we add the production $$[q,X,r] \to a.$$ This represents the action of popping $X$ off the stack, reading a symbol $a$, and moving to state $r$.

If $k \geq 1$, then we add productions of the form $$[q,X,r_k] \to a[r,Y_1,r_1] [r_1,Y_2,r_2] \cdots [r_{k-1},Y_k,r_k]$$ for all possible $r_1, \dots, r_k \in Q$. Be careful of the subscripts here: note that though we have a transition to the state $r$, the variable on the left hand side of the production is for the state $r_k$—the state $r$ is in the first variable of the right hand side. This represents the action of reading a symbol $a$, popping $X$ from the stack, pushing $Y_1 \cdots Y_k$ on to the stack, and considering all possible sequences of states $r_1, \dots, r_k$ such that we can pop $Y_1 \cdots Y_k$, while moving to state $r$.

There is a question of whether we can actually do this thing of taking all the possible sequences of states, since it seems like there could be an infinite number of such sequences. However, note that each of these production is based on a transition and the number of stack symbols that transition pushes onto the stack. Since this number is fixed, we only need to consider all of these combinations. Not all of these variables will be useful, and those that are will be the ones that admit a valid derivation.

Since we want to generate all words such that the PDA $M$ reaches a configuration with an empty stack, we add transitions for all states $q \in Q$ and the start state $q_0$ $$S \to [q_0,Z_0,q].$$ From this, we have $(q_0,w,Z_0) \vdash^* (q,\varepsilon,\varepsilon)$ if and only if $S \Rightarrow [q_0,Z_0,q] \Rightarrow^* w$.

We can use this result to prove things about context-free languages that were maybe too cumbersome to do with grammars. For instance, it was an exercise to show that CFLs are closed under union, concatenation, and star and this can be accomplished fairly easily with grammars. But what about, say intersection? Here, we'll show that CFLs are closed under intersection with a regular language. This might seem strange at first, but it turns out to be quite handy.

Let $L \subseteq \Sigma$ be a context-free language and let $R \subseteq \Sigma^*$ be a regular language. Then $L \cap R$ is a context-free language.

Let $A = (Q_A,\Sigma,\Gamma,\delta_A,s_A,Z_0,F_A)$ be a PDA that accepts $L$ and let $B = (Q_B, \Sigma, \delta_B, s_B, F_B)$ be a DFA that recognizes $R$. We will construct a PDA $C = (Q_C, \Sigma, \Gamma, \delta_C, s_C, Z_0, F_C)$ that recognizes $L \cap R$. We have

The PDA $C$ works by simultaneously moving through a computation of $A$ and $B$. When a transition is made, $C$ will keep track of the current state of $A$ via the first component of each state of $C$ and by using the stack as normal. The current state of $B$ is kept track of as the second component of the states of $C$. In addition, when $A$ makes $\varepsilon$-transitions, the state of $B$ does not change.

It is not too difficult to see that in this way, $C$ can move simultaneously through $A$ and $B$. Then, the set of accepting states of $C$ is defined to be pairs of states in $F_A$ and $F_B$. In other words, a word accepted by $C$ must have a computation that ends in a state $\langle p,q \rangle$ with $p \in F_A$ and $q \in F_B$. In other words, there is a computation for $w$ on $A$ that ends in a final state and there is a computation for $w$ on $B$ that ends in a final state. Thus, $C$ accepts $w$ if and only if $w$ is accepted by both $A$ and $B$ and this is only the case if and only if $w \in L \cap R$. Thus, $C$ is a PDA that accepts $L \cap R$ and $L \cap R$ is a context-free language.

But how about intersection of two context-free languages? That poses a bit more of a challenge. For one thing, our product construction approach will fail—there's no way to "share" a stack. For now, we're stuck and will have to keep this in the back of our minds.