CMSC 28000 — Lecture 17

Acceptance by empty stack

When we defined acceptance, we were careful to say that we were defining acceptance by final state. This implies that this is not the only model of acceptance for pushdown automata.

Let $P$ be a PDA. Then we can define the language of the machine by $$ N(P) = \{w \in \Sigma^* \mid (q_0, w, Z_0) \vdash^* (q,\varepsilon,\varepsilon), q \in Q \}.$$

In other words, we accept a word if the stack is empty after reading it. Obviously, under this model, final states are superfluous, so PDAs using this acceptance model can be described by a 6-tuple instead.

As it turns out, the two models of acceptance are equivalent. That is given a PDA defined under one model of acceptance, it's possible to construct a PDA under the other model of acceptance which accepts the same language. We'll sketch out the constructions.

Let $A$ be a PDA that accepts by final state. Then there exists a PDA $A'$ which accepts $L(A)$ by empty stack.

Our new machine $A'$ needs to satisfy two properties. First of all, whenever $A$ reaches a final state on a word $w$ (that is, $A$ accepts $w$), we want to empty the stack. Secondly, we don't want the stack to ever be empty before the entire input is read.

For the first property, we create a new state $q'$ and add transitions $\delta(q,\varepsilon,\gamma) = (q',\varepsilon)$ from every final state $q$ of $A$ to $q'$. Then $q'$ only contains transitions of the form $\delta(q',\varepsilon,\gamma) = (q',\varepsilon)$ for all $\gamma \in \Gamma$. Also note that any computation that enters $q'$ with part of the input still unread will crash.

For the second property, to prevent the stack from being empty during the course of the computation, we add a new symbol $X_0$ to the bottom of the stack. We do this by adding a new initial state with a transition to the initial state of $A$ whose only function is to place $X_0$ below $Z_0$ on the stack. Since $X_0$ is not in the stack alphabet of $A$, there is no transition that can remove it and the only way the stack can be empty is after visitng $q'$.

Let $B$ be a PDA that accepts by empty stack. Then there exists a PDA $B'$ which accepts $L(B)$ by final state.

To construct our new PDA $B'$, we need to ensure that the machine enters a final state whenever the machine would have emptied its stack and accepted. To do this, we add a new state $q'$ to $B'$ which is its sole accepting state.

However, we also need a means of detecting that the stack is empty. To do this, we again add a new symbol $X_0$ to the stack alphabet and add a new initial state that adds $X_0$ to the stack below $Z_0$.

With this, we can add transitions $\delta(q,\varepsilon,X_0) = (q',\varepsilon)$ from every state $q$ of $B$. Since there are no transitions involving the stack symbol $X_0$ in $B$, it is clear that the machine only takes the transition to $q'$ whenever $B$ would have emptied its stack.

Equivalence of CFGs and PDAs

Just as regular expressions and finite automata happened to have a correspondence, so do context-free grammars and pushdown automata. For proofs in both directions, we'll take advantage of the fact that PDAs that accept via final state are equivalent to those that accept by empty stack. We will show that CFGs can be converted to PDAs that accept via empty stack and vice-versa. Since we already showed that these are equivalent with PDAs that accept via final state, we effectively show that these are equivalent to CFGs too.

Let $G$ be a context-free grammar. Then there exists a pushdown automaton $M$ such that $N(M) = L(G)$.

Let $G = (V,\Sigma,P,S)$ be a CFG. We will create a PDA $M = (Q,\Sigma,\Gamma, \delta, q_0, Z_0)$ which accepts by empty stack defined as follows:

and we define the following transitions

Here is what the machine looks like.

This machine essentially acts as a parser for $G$, albeit a nondeterministic one. And unlike many other machines we've seen before, the computation in this machine is driven primarily by the stack:

This process essentially performs a leftmost derivation of our input word. It's important to note that we can elide many of the tricky details of how we can ensure the correct production rule is chosen when we see $A$ through the magic of nondeterminism.

To formally prove this, we need to show that we have a derivation $S \Rightarrow^* w$ over $G$ if and only if there is a computation $(q,w,S) \vdash^* (q,\varepsilon,\varepsilon)$ in $M$. The details follow.

We will first show that $L(G) \subseteq N(M)$. Let $w \in L(G)$. There exists a leftmost derivation for $w$ $$S = \gamma_1 \Rightarrow \gamma_2 \Rightarrow \cdots \Rightarrow \gamma_n = w,$$ where $\gamma_i \in (V \cup \Sigma)^*$ for $1 \leq i \leq n$. Since this is a leftmost derivation, we can write $\gamma_i = x_i \alpha_i$, where $x_i \in \Sigma^*$ and $\alpha_i \in V(V \cup \Sigma)^*$ for $1 \leq i \leq n-1$. In other words, $\alpha_i$ begins with the leftmost variable of $\gamma_i$. Furthermore, $x_i$ must be a prefix of $w$ and for each $x_i$, we can write $w = x_i y_i$ for some $y_i \in \Sigma^*$.

We will show by induction on $i$, that $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ on the PDA $M$ for all $1 \leq i \leq n$.

For our base case, we take $i = 1$. We have $x_1 = \varepsilon$, $y_1 = w$, and $\alpha_1 = S$. Thus, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$ trivially.

For our inductive hypothesis, we assume that $(q,w,S) \vdash^* (q,y_j,\alpha_j)$ for all $j < i$. We can apply this to get $(q,w,S) \vdash^* (q,y_{i-1},\alpha_{i-1})$, where $x_{i-1} y_{i-1}$, $\alpha_{i-1} = A_{i-1} \eta_{i-1}$ for some $\eta_{i-1} \in (V \cup \Sigma)^*$, and $\gamma_{i-1} = x_{i-1} \alpha_{i-1}$.

Since we are considering the leftmost derivation, there must exist a production $A_{i-1} \to \beta_{i-1}$ in $G$ for some $\beta_{i-1} \in (V \cup \Sigma)^*$. By our definition of $M$, there is a transition with $(q,\beta_{i-1}) \in \delta(q,\varepsilon,A_{i-1})$. Therefore, we have $(q,y_{i-1},\alpha_{i-1}) \vdash (q,y_{i-1},\beta_{i-1}\eta_{i-1})$.

Now, write $y_{i-1} = y_{i-1}' y_i$ and $\beta_{i-1} \eta_{i-1} = y_{i-1}' \alpha_i$, where $y_{i-1}' \in \Sigma^*$. We can read $y_{i-1}'$ via transitions $\delta(q,a,a) = \{(q,\varepsilon)\}$ for all $a \in \Sigma$ to give us $(q,w,S) \vdash^* (q,y_i,\alpha_i)$. Thus, we have shown that for each $i$, we have $(q,w,S) \vdash^* (q,y_i,\alpha_i)$. Then taking $i = n$, we have $$(q,w,S) \vdash^* (q,y_n,\alpha_n) = (q,\varepsilon,\varepsilon)$$ and thus $w \in N(M)$ as desired.

Next, we will show that $N(M) \subseteq L(G)$. Let $w \in N(M)$. We will show that for any variable $A \in V$, if $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$, then $A \Rightarrow^* w$. We will show this by induction on $n$, then length of an accepting computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$.

For our base case, we have $n = 1$, since $A \neq \varepsilon$. Since $A$ is at the top of the stack, we must choose a transition of the form $\delta(q,\varepsilon,A)$. However, this means that $w = \varepsilon$ and $(q,\varepsilon) \in \delta(q,\varepsilon,A)$ and this implies that there is a rule $A \to \varepsilon$ in $G$. Thus, we have $A \Rightarrow^* w$.

Now, suppose that $n > 1$ and for our inductive hypothesis, we assume that for all variables $A \in V$, if the length of a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of $M$ is fewer than $n$ steps, then $A \Rightarrow^* w$. Consider a computation $(q,w,A) \vdash^* (q,\varepsilon,\varepsilon)$ of length $n$.

Again, we must first take a transition $\delta(q,\varepsilon,A)$ since $A$ is on the top of the stack. This time, we take the transition corresponding to a rule of the form $A \to \alpha_1 \cdots \alpha_k$, where $\alpha_1, \dots, \alpha_k \in (V \cup \Sigma)$, and we have $(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k)$. From here, consider the following computation, $$(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$$ where $w'$ is a suffix of $w$ with $w = w_1 w'$.

In other words, we have $(q,w_1 w',\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$. Now, we make use of the following observation: the computation $$(q,w_1,\alpha_1 \cdots \alpha_k) \vdash^* (q,\varepsilon,\alpha_2 \cdots \alpha_k)$$ is also a valid computation in $M$. The reasoning is simple: the computation clearly does not depend on $w'$, since none of $w'$ was read. If our input word were simply $w_1$, we would have exactly the above computation. However, this leads us to another similarly useful observation we can make: the computation $$(q,w_1,\alpha_1) \vdash^* (q,\varepsilon,\varepsilon)$$ is also a valid computation in $M$. The reasoning is similar: if we began our computation of $w_1$ with only $\alpha_1$ on the stack, we would arrive at the same configuration since $\alpha_2 \cdots \alpha_k$ were inconsequential to the computation.

Now, we observe that we have one of two possibilities. If $\alpha_1 \in \Sigma$, then $(q,w_1,\alpha_1) \vdash (q,\varepsilon,\varepsilon)$ implies that $w_1 = \alpha_1$ and we have $\alpha_1 \Rightarrow^* w_1$ trivially. Otherwise, we have $\alpha_1 \in V$ and since $(q,w_1,\alpha_1) \vdash (q,\varepsilon,\varepsilon)$ is a computation of length less than $n$, we have $\alpha_1 \Rightarrow^* w_1$ by the inductive hypothesis.

We can now return to the computation $$(q,w,A) \vdash (q,w,\alpha_1 \cdots \alpha_k) \vdash^* (q,w',\alpha_2 \cdots \alpha_k)$$ and observe that we have $$(q,w,A) \vdash (q,w_i w',\alpha_i \cdots \alpha_k) \vdash^* (q,w',\alpha_{i-1} \cdots \alpha_k)$$ for every $1 \leq i \leq k$ and we can apply a similar argument to obtain derivations $\alpha_i \Rightarrow^* w_i$ for $w = w_1 \cdots w_k$. This gives us our derivation, $$A \Rightarrow \alpha_1 \cdots \alpha_k \Rightarrow^* w_1 \alpha_2 \cdots \alpha_k \Rightarrow^* w_1 w_2 \alpha_3 \cdots \alpha_k \Rightarrow^* w_1 \cdots w_k = w$$ and thus $A \Rightarrow^* w$. Taking $A = S$, we have $S \Rightarrow^* w$ and $w \in L(G)$ as desired.

Consider the grammar for palindromes, $$S \rightarrow aSa \mid bSb \mid a \mid b \mid \varepsilon,$$ and consider a PDA $P$ for this grammar, which we obtain by following the construction of Theorem 16.4. Then a computation of $P$ on the word $aababaa$ looks like \begin{align*} (q,aababaa,S) & \vdash (q,aababaa,aSa) \\ & \vdash (q,ababaa,Sa) \\ & \vdash (q,ababaa,aSaa) \\ & \vdash (q,babaa,Saa) \\ & \vdash (q,babaa,bSbaa) \\ & \vdash (q,abaa,Sbaa) \\ & \vdash (q,abaa,abaa) \\ & \vdash (q,baa,baa) \\ & \vdash (q,aa,aa) \\ & \vdash (q,a,a) \\ & \vdash (q,\varepsilon, \varepsilon) \end{align*}