CMSC 28000 — Lecture 17

Acceptance by empty stack

When we defined acceptance, we were careful to say that we were defining acceptance by final state. This implies that this is not the only model of acceptance for pushdown automata.

Let $P$ be a PDA. Then we can define the language of the machine by $$ N(P) = \{w \in \Sigma^* \mid (q_0, w, Z_0) \vdash^* (q,\varepsilon,\varepsilon), q \in Q \}.$$

In other words, we accept a word if the stack is empty after reading it. Obviously, under this model, final states are superfluous, so PDAs using this acceptance model can be described by a 6-tuple instead.

As it turns out, the two models of acceptance are equivalent. That is given a PDA defined under one model of acceptance, it's possible to construct a PDA under the other model of acceptance which accepts the same language. We'll sketch out the constructions.

Let $A$ be a PDA that accepts by final state. Then there exists a PDA $A'$ which accepts $L(A)$ by empty stack.

Our new machine $A'$ needs to satisfy two properties. First of all, whenever $A$ reaches a final state on a word $w$ (that is, $A$ accepts $w$), we want to empty the stack. Secondly, we don't want the stack to ever be empty before the entire input is read.

For the first property, we create a new state $q'$ and add transitions $\delta(q,\varepsilon,\gamma) = (q',\varepsilon)$ from every final state $q$ of $A$ to $q'$. Then $q'$ only contains transitions of the form $\delta(q',\varepsilon,\gamma) = (q',\varepsilon)$ for all $\gamma \in \Gamma$. Also note that any computation that enters $q'$ with part of the input still unread will crash.

For the second property, to prevent the stack from being empty during the course of the computation, we add a new symbol $X_0$ to the bottom of the stack. We do this by adding a new initial state with a transition to the initial state of $A$ whose only function is to place $X_0$ below $Z_0$ on the stack. Since $X_0$ is not in the stack alphabet of $A$, there is no transition that can remove it and the only way the stack can be empty is after visitng $q'$.

Let $B$ be a PDA that accepts by empty stack. Then there exists a PDA $B'$ which accepts $L(B)$ by final state.

To construct our new PDA $B'$, we need to ensure that the machine enters a final state whenever the machine would have emptied its stack and accepted. To do this, we add a new state $q'$ to $B'$ which is its sole accepting state.

However, we also need a means of detecting that the stack is empty. To do this, we again add a new symbol $X_0$ to the stack alphabet and add a new initial state that adds $X_0$ to the stack below $Z_0$.

With this, we can add transitions $\delta(q,\varepsilon,X_0) = (q',\varepsilon)$ from every state $q$ of $B$. Since there are no transitions involving the stack symbol $X_0$ in $B$, it is clear that the machine only takes the transition to $q'$ whenever $B$ would have emptied its stack.

Equivalence of CFGs and PDAs

Just as regular expressions and finite automata happened to have a correspondence, so do context-free grammars and pushdown automata. For proofs in both directions, we'll take advantage of the fact that PDAs that accept via final state are equivalent to those that accept by empty stack. We will show that CFGs can be converted to PDAs that accept via empty stack and vice-versa. Since we already showed that these are equivalent with PDAs that accept via final state, we effectively show that these are equivalent to CFGs too.

Let $G$ be a context-free grammar. Then there exists a pushdown automaton $M$ such that $N(M) = L(G)$.

Let $G = (V,\Sigma,P,S)$ be a CFG. We will create a PDA $M = (Q,\Sigma,\Gamma, \delta, q_0, Z_0)$ which accepts by empty stack defined as follows:

and we define the following transitions

Here is what the machine looks like.

This machine essentially acts as a parser for $G$, albeit a nondeterministic one. And unlike many other machines we've seen before, the computation in this machine is driven primarily by the stack:

This process essentially performs a leftmost derivation of our input word. It's important to note that we can elide many of the tricky details of how we can ensure the correct production rule is chosen when we see $A$ through the magic of nondeterminism.

To formally prove this, we need to show that we have a derivation $S \Rightarrow^* w$ over $G$ if and only if there is a computation $(q,w,S) \vdash^* (q,\varepsilon,\varepsilon)$ in $M$. The details follow.

We will first show that $L(G) \subseteq N(M)$. Let $w \in L(G)$. There exists a leftmost derivation for $w$ $$S = \gamma_1 \Rightarrow \gamma_2 \Rightarrow \cdots \Rightarrow \gamma_n = w,$$ where $\gamma_i \in (V \cup \Sigma)^*$ for $1 \leq i \leq n$. Since this is a leftmost derivation, we can write $\gamma_i = x_i \alpha_i$, where $x_i \in \Sigma^*$ and $\alpha_i \in V(V \cup \Sigma)^*$ for $1 \leq i \leq n-1$. In other words, $\alpha_i$ begins with the leftmost variable of $\gamma_i$. Furthermore, $x_i$ must be a prefix of $w$ and for each $x_i$, we can write $w = x_i y_i$ for some $y_i \in \Sigma^*$.

We will show by induction on $i$, that if $S \Rightarrow^* \gamma_i = x_i \alpha_i$, then $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ on the PDA $M$ for all $1 \leq i \leq n$.

For our base case, we take $i = 1$. We have $x_1 = \varepsilon$, $y_1 = w$, and $\alpha_1 = S$. Thus, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ trivially.

For our inductive hypothesis, we assume that $(q,w,S) \vdash^* (q,y_j,\alpha_j)$ for all $j < i$. So suppose we have $S \Rightarrow^* \gamma_{i-1} = x_{i-1} \alpha_{i-1}$. By the inductive hypothesis, there exists a computation $(q,w,S) \vdash^* (q,y_{i-1},\alpha_{i-1})$, where

At this point, there is a variable $A_{i-1}$ at the top of the stack. Since we want a sentential form $x_i \alpha_i$, where the first symbol of $\alpha_i$ is a variable, this means we need to do the following:

  1. Pop $A_{i-1}$ off the stack.
  2. Replace $A_{i-1}$ by applying some production.
  3. Consume any terminals on the stack until there is a variable at the top.

So we pop $A_{i-1}$ off the stack. Since we have a leftmost derivation of $w$, there must exist a production $A_{i-1} \to \beta_{i-1}$ in $G$ for some $\beta_{i-1} \in (V \cup \Sigma)^*$. By our definition of $M$, there is a transition with $(q,\beta_{i-1}) \in \delta(q,\varepsilon,A_{i-1})$. Therefore, we have $(q,y_{i-1},\alpha_{i-1}) \vdash (q,y_{i-1},\beta_{i-1}\eta_{i-1})$. That is, the machine pops $A_{i-1}$ off the stack and applies the rule $A_{i-1} \to \beta_{i-1}$ by putting $\beta_{i-1}$ on the stack.

At this point, $\beta_{i-1}$ is on the top of the stack. However, $\beta_{i-1}$ is an arbitrary sentential form made up of terminals and non-terminals. To consume terminals on the stack, we must have the same symbols in our input string and apply the rule $\delta(q,a,a) = \{(q,\varepsilon)\}$ for $a \in \Sigma$.

Let $z_{i-1}$ denote the symbols on the top of the stack to be consumed.

Reading $z_{i-1}$ then gives us $$(q,y_{i-1},\alpha_{i-1}) = (q,y_{i-1}, A_{i-1} \eta_{i-1}) \vdash (q, y_{i-1}, \beta_{i-1} \eta_{i-1}) = (q, z_{i-1} y_i, z_{i-1} \alpha_i) \vdash^* (q,y_i,\alpha_i).$$ Thus, we have shown that for each $i$, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$. Then taking $i = n$, we have $$(q,w,S) \vdash^* (q,y_n,\alpha_n) = (q,\varepsilon,\varepsilon)$$ and thus $w \in N(M)$ as desired.

Next, we will show that $N(M) \subseteq L(G)$. Let $w \in N(M)$. We will show that for any variable $A \in V$, if $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$, then $A \Rightarrow^* w$. We will show this by induction on $n$, then length of an accepting computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$.

For our base case, we have $n = 1$, since $A \neq \varepsilon$. Since $A$ is at the top of the stack, we must choose a transition of the form $\delta(q,\varepsilon,A)$. However, this means that $w = \varepsilon$ and $(q,\varepsilon) \in \delta(q,\varepsilon,A)$ and this implies that there is a rule $A \to \varepsilon$ in $G$. Thus, we have $A \Rightarrow^* w$.

Now, suppose that $n > 1$ and for our inductive hypothesis, we assume that for all variables $A \in V$, if the length of a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of $M$ is fewer than $n$ steps, then $A \Rightarrow^* w$. Consider a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of length $n$.

Again, we must first take a transition $\delta(q,\varepsilon,A)$ since $A$ is on the top of the stack. This time, we take the transition corresponding to a rule of the form $A \to Y_1 \cdots Y_k$, where $Y_1, \dots, Y_k \in (V \cup \Sigma)$, and we have $(q,w,A) \vdash (q,w,Y_1 \cdots Y_k)$. From here, consider the following computation, $$(q,w,A) \vdash (q,w,Y_1 \cdots Y_k) \vdash^* (q,x_1,Y_2 \cdots Y_k)$$ where $x_1$ is a suffix of $w$ with $w = w_1 x_1$.

In other words, we have $(q,w_1 x_1,Y_1 \cdots Y_k) \vdash^* (q,x_1,Y_2 \cdots Y_k)$. Now, we make use of the following observation: the computation $$(q,w_1,Y_1 \cdots Y_k) \vdash^* (q,\varepsilon,Y_2 \cdots Y_k)$$ is also a valid computation in $M$. The reasoning is simple: the computation clearly does not depend on $x_1$, since none of $x_1$ was read. If our input word were simply $w_1$, we would have exactly the above computation. However, this leads us to another similarly useful observation we can make: the computation $$(q,w_1,Y_1) \vdash^* (q,\varepsilon,\varepsilon)$$ is also a valid computation in $M$. The reasoning is similar: if we began our computation of $w_1$ with only $Y_1$ on the stack, we would arrive at the same configuration since $Y_2 \cdots Y_k$ were inconsequential to the computation.

Now, we observe that we have one of two possibilities.

We can now return to the computation $$(q,w,A) \vdash (q,w,Y_1 \cdots Y_k) \vdash^* (q,x_1,Y_2 \cdots Y_k)$$ and observe that we have $$(q,x_{i-2},Y_{i-1} \cdots Y_k) = (q,w_{i-1} x_{i-1},Y_{i-1} \cdots Y_k) \vdash^* (q,x_{i-1},Y_i \cdots Y_k)$$ for every $1 \leq i \leq k$. So we can apply a similar argument to obtain derivations $Y_i \Rightarrow^* w_i$ for $w = w_1 \cdots w_k$. This gives us our derivation, $$A \Rightarrow Y_1 \cdots Y_k \Rightarrow^* w_1 Y_2 \cdots Y_k \Rightarrow^* w_1 w_2 Y_3 \cdots Y_k \Rightarrow^* w_1 \cdots w_k = w$$ and thus $A \Rightarrow^* w$. Taking $A = S$, we have $S \Rightarrow^* w$ and $w \in L(G)$ as desired.

Consider the grammar for palindromes, $$S \rightarrow aSa \mid bSb \mid a \mid b \mid \varepsilon,$$ and consider a PDA $P$ for this grammar, which we obtain by following the construction of Theorem 17.4. Then a computation of $P$ on the word $aababaa$ looks like \begin{align*} (q,aababaa,S) & \vdash (q,aababaa,aSa) \\ & \vdash (q,ababaa,Sa) \\ & \vdash (q,ababaa,aSaa) \\ & \vdash (q,babaa,Saa) \\ & \vdash (q,babaa,bSbaa) \\ & \vdash (q,abaa,Sbaa) \\ & \vdash (q,abaa,abaa) \\ & \vdash (q,baa,baa) \\ & \vdash (q,aa,aa) \\ & \vdash (q,a,a) \\ & \vdash (q,\varepsilon, \varepsilon) \end{align*}