CMSC 28000 — Lecture 17

At this point, we might want to play around with our model in the same way that we did with our finite automaton. One observation that we've made so far is that we've begun with a nondeterministic machine model for the PDA. An obvious question is what happens when we try to restrict the machine to be deterministic. The next question is how to define this.

A deterministic pushdown automaton (DPDA) is a PDA $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0,F)$ with the additional conditions:

The class of languages accepted by DPDAs by final state are called the deterministic context-free languages.

It's clear from this definition that a PDA can recognize any language accepted by a DPDA. But note that a DPDA is not simply a PDA with the codomain of the transition function restricted to $Q \times \Gamma^*$—we are allowed both to have undefined transitions (i.e. transitions that end up in $\emptyset$) as well as $\varepsilon$-transitions.

We should parse out what these restrictions imply. The first condition restricts the choices of a transition for any particular state, input symbol, and stack symbol. In this case, there is no ambiguity, since the transition is uniquely determined by the state, input symbol, and stack symbol.

The second condition concerns $\varepsilon$-transitions. We allow $\varepsilon$-transitions because it is possible that we want to take a transition based on the symbol on the top of the stack. In this case, if we must have an $\varepsilon$-transition, we disallow any other transitions on the same stack symbol. Without this condition, we could sneak in some nondeterminism by making a choice between consuming an input symbol or not.

You might recognize it. This is the PDA for $\{a^n b^n \mid n \geq 0\}$. However, it is not a DPDA, since $\delta(q_0,\varepsilon,Z_0) \neq \emptyset$ but $\delta(q_0,a,Z_0) \neq \emptyset$. However, this is easy to fix.

Now, we have a DPDA. A natural question to ask is, just as with NFAs, is there a way to determinize every PDA into a DPDA?

The DPDA model is due to Seymour Ginsburg and Sheila Greibach in 1966. This puts the development of the DPDA a few years after the standard PDA model. We'll see that, unlike with finite automata, there are some significant differences between DCFLs and CFLs.

Here is one quick distinction. Recall our PDA construction that computed a leftmost derivation for any given CFG $G$ as part of our proof that PDAs and CFGs are equivalent. If we could construct an equivalent DPDA, this would suggest a deterministic way of computing a leftmost derivation for any word under any CFG. But the existence of inherently ambiguous languages suggests otherwise.

One important property of DCFLs that they show is that DCFLs are closed under complementation. We'll see shortly that this is not the case for context-free languages in general. This is another property that implies that the classes of DCFLs and CFLs are different.

Intuitively, since we're now working with deterministic machines, we can imagine that we might attempt a trick similar to what we did with DFAs: just swap the final and non-final states. However, various details of the PDA model (incomplete transtion functions, $\varepsilon$-transitions, the stack) complicate this.

The first issue occurs when a word is rejected without being read entirely. Just like NFAs, not every transition in a DPDA needs to be defined—some transitions may go to $\emptyset$. This causes problems if we try to simply "flip" the final and non-final states, because a word $w$ with $\delta(q_0,w,Z_0) = \emptyset$ still won't get accepted. Even worse, any word with $w$ as a prefix can't get accepted either. A similar situation occurs if the stack gets emptied during the computation of $w$—the machine crashes.

The second issue occurs in dealing with $\varepsilon$-transitions. In one case, we end up with something similar to the above: our machine enters an infinite loop and reading the input word is never completed. In the second case, we could have finished reading our input, but continue to make $\varepsilon$-transitions. In this case, if we enter both final and non-final states, it is not enough to swap the states, since this computation still finishes in both states after we swap them.

Luckily, we can solve both of these problems. The following will show how to force a DPDA to read its entire input.

Let $M$ be a DPDA. There exists an equivalent DPDA $M'$ that scans the entire input for every input word.

To prevent crashes, we need to ensure that there is always a transition defined for every state, input symbol, and stack symbol. To do this, we add a sink/dead state $q_d$ and add transitions to this state for all undefined transitions. The machine then consumes the rest of the input in this state.

We also need to prevent crashes which occur when attempting to proceed on an empty stack. In this case, we add a new bottom of stack symbol $X_0$. If the machine pops $X_0$, then we can move to the dead state $q_d$ and consume the rest of the input.

Now, we need to fix the problem of infinite loops. To do this, we need to find all such loops. Suppose the machine enters such a loop from the configuration $(q,\varepsilon,X)$. If the loops contain a final state, we add a new final state $q_f$ and add a transition to this state. Otherwise, we add a transition to $q_d$. Clearly, this doesn't change the language.

Now that we have a DPDA that is forced to read all of its input, we can use it to construct a DPDA that recognizes the complement.

Let $L$ be a deterministic context-free language. Then $\overline L$ is also a deterministic context-free language.

From the above lemma, we can assume that we have a DPDA $M = (Q,\Sigma,\Gamma,\delta,q_0,Z_0,F)$ for $L$ that always reads its entire input. We want to construct a DPDA $M' = (Q',\Sigma,\Gamma,\delta',q_0',Z_0,F')$ for $\overline L$. Because of the other infinite loop scenario, where we have finished reading the input, but our machine reaches both final and non-final states, we cannot simiply flip the states. Instead, we need to keep track of whether a final state has been visited in this scenario.

To do this, we augment our states with an indicator, so our set of states is $Q \times \{Y,N,A\}$. The indicator tells us whether the machine has entered a final state since reading an input from $\Sigma$ (i.e., not $\varepsilon$). If it has, then the indicator is set to $Y$. If it hasn't, then it's $N$. And if the indicator was $N$, then the indicator is set to $A$ on the next $\varepsilon$-transition to indicate that the machine is waiting for the next non-$\varepsilon$ input symbol.

We can formally define $\delta'$ for $q,q' \in Q$ as follows.

If $\delta(q,a,X) = (q',\alpha)$ for $a \in \Sigma$, then \[\delta'(\langle q,Y \rangle, a, X) = \delta'(\langle q,A \rangle, a X) = \begin{cases} (\langle q',Y \rangle, \alpha) & \text{if $q' \in F$,} \\ (\langle q',N \rangle, \alpha) & \text{if $q' \not \in F$,} \end{cases} \] and $\delta'(\langle q,N \rangle, \varepsilon, X) = (\langle q,A \rangle,X)$.
If $\delta(q,\varepsilon,X) = (q',\alpha)$, then $\delta'(\langle q,Y \rangle,\varepsilon, X) = (\langle q', Y \rangle, \alpha)$ and \[\delta'(\langle q, N \rangle, \varepsilon, X) = \begin{cases} (\langle q',Y \rangle, \alpha) & \text{if $q' \in F$,} \\ (\langle q',N \rangle, \alpha) & \text{if $q' \not \in F$.} \end{cases}\]

We set the initial state to be \[q_0' = \begin{cases} \langle q_0,Y \rangle & \text{if $q_0 \in F$},\\ \langle q_0,N \rangle & \text{if $q_0 \not \in F$}, \end{cases}\] and the set of accepting states is $F' = \{\langle q, A \rangle \mid q \in Q\}$.

To see that $L(M') = \overline{L(M)}$, suppose that $w = a_1 a_2 \cdots a_n \in L(M)$. Then $M$ enters a final state after reading $a_n$. In $M'$, this would be a state with the indicator set to $Y$, which can't change to an $A$, since there are no input symbols left to read. Then $w$ won't be accepted by $M'$.

Now, suppose $w = a_1 a_2 \cdots a_n \not in L(M)$. We know that $w$ will be read entirely by $M'$ and will not enter a final state after reading $a_n$, so the indicator is set to $N$. After using up all $\varepsilon$-transitions, the machine will flip the indicator to $A$ to force reading another input symbol. However, there is no input left, and $M'$ accepts.

We can make use of this to show that there are context-free languages that are not deterministic.

The language $L = \{w \in \Sigma^* \mid w \neq xx\}$ is not a deterministic context-free language.

The language $L$ consists of words that are not squares. This language turns out to be context-free (a fun exercise for you). But is it deterministic? If it were, then its complement, \[\overline L = \{xx \mid x \in \Sigma^* \}\] should also be a DCFL. However, one can show that this is not even a context-free language (How? We'll see next class). So since $\overline L$ is not a DCFL, $L$ could not have been a DCFL.

Notice that this also happens to be a proof that the context-free languages are not closed under complement!

Linear-Time Parsing

So what is the point of DCFLs if they're not as powerful as CFLs? As with many things in computer science, we can talk about tradeoffs. Recall that parsing with a context-free grammar can be done in cubic time via CYK and not significantly better, due to the matrix multiplication lower bound. Along with CYK, we've also seen an example of using a PDA to parse a grammar, when we showed that CFGs were equivalent to PDAs. In that case, we used a top-down method, starting with the start symbol $S$ and nondeterministically choosing productions to construct a leftmost derivation.

This is where DCFLs come in. DCFLs are a class of languages that have associated grammars that can be parsed in linear time. These are the LR(k) grammars, due to Donald Knuth in 1965.

The name here gives away our strategy. In $\mathrm{LR}(k)$, L stands for left-to-right scan, R stands for rightmost derivation, and $k$ is the number of characters we're allowed to "look ahead" in order to decide what to do. We'll describe the basics for $\mathrm{LR}(0)$ grammars. Grammars with more lookahead use the same ideas and can be parsed in a similar way.

Recall that our nondeterministic PDA parser attempted to guess a leftmost derivation for an input word, starting from $S$. Instead, what we'll try to do is compute a rightmost derivation $S \Rightarrow^* w$, and instead of doing it starting with $S$, we'll try to do it backwards, starting from $w$. In a sense, this method resembles our approach with CYK, which began with the terminals and working our way back up to the start variable. The main difference here is that we want to attempt this with only one pass of the input word.

For instance, suppose we have the following grammar, \begin{align*} S &\to Ac \\ A &\to AB \mid B \\ B &\to aAb \mid ab \end{align*} and we are given a word $aabbabc$. For convenience, we can keep track of our position with a bullet $\bullet aabbabc$. At each step, we will make one of the following choices:

For example, suppose we have read and shifted a bunch of symbols onto the stack, as in $aab \bullet babc$. Notice that we can view the contents to the left of the bullet as the contents of the stack with the right hand side at the top of the stack. At this point, we can choose to reduce what's on the stack according to the rule $B \to ab$. In this case, we can view our partially processed word as $aB \bullet babc$.

There is a broad class of parsers, which are called shift-reduce parsers that are based on this approach. However, there is a hitch with what we've described so far: it's still not clear how to choose which step to take if we're given the choice. First, we observe that if we take out the bullet, we have a sentential form, $aBbabc$. In fact, this is a right sentential form.

A right sentential form is a sentential form that appears in a rightmost derivation. A handle is a subword of a right sentential form that could have been introduced by an application of a production rule. That is, if we have a rightmost derivation $S \Rightarrow^* \alpha A x \Rightarrow \alpha \beta x$ for $\alpha, \beta \in (V \cup \Sigma)^*$, $A \in V$, and $x \in \Sigma^*$, and $A \to \beta \in P$, then $\beta$ is a handle for $\alpha \beta x$.

Note that a right sentential form is always going to look like a string $\alpha x$, where $\alpha \in (V \cup \Sigma)^*$ and $x \in \Sigma^*$ because we always expand the rightmost variables first. Then essentially what we are trying to do is replace handles with the left hand sides of their corresponding productions.

In our example above, we replaced the handle $ab$ in $aab \bullet babc$ via the rule $B \to ab$ to get $aB \bullet babc$. If we repeat this process on the right sentential forms, we'll build our rightmost derivation backwards, and end with the start variable $S$. The difficulty is figuring out when we have a handle. To do so, we need to consider viable prefixes.

A viable prefix of a right sentential form $\alpha \beta x$ is a prefix that ends at or before the end of a handle $\beta$.

We use viable to prefixes to determine when we might have a handle. If we imagine the state of the DPDA, we always have a right sentential form: the prefix of this right sentential form is in the stack, while the suffix is the unread portion of the input. Then the stack will contain a viable prefix until we think we have seen a handle. This is done by augmenting our grammar with more dots.

An item is a production rule with a dot on the right hand side. That is, if $A \to \alpha \beta$ for $\alpha, \beta \in (V \cup \Sigma)^*$, then $A \to \alpha \cdot \beta$ is an item. We say an item $A \to \alpha \bullet \beta$ is valid for a viable prefix $\gamma$ if there is a rightmost derivation $S \Rightarrow^* \delta A x \Rightarrow \delta \alpha \beta x$ and $\delta\alpha = \gamma$. An item is complete if the bullet is on the right end of the production, $A \to \alpha \bullet$.

The dots in the items play the same role as the dots in our notation: they are an indicator of how much we've seen, except in relation to a possible handle and associated production that we can use to reduce the contents of our stack. If an item is valid for a viable prefix, then it is a candidate for our reduction step.

In our grammar, we have the items \begin{align*} S &\to \bullet Ac & S &\to A \bullet c & S &\to Ac \bullet \\ A &\to \bullet AB & A &\to A \bullet B & A &\to AB \bullet \\ A &\to \bullet B & A &\to B \bullet \\ B &\to \bullet aAb & B &\to a \bullet Ab & B &\to aA \bullet b & B &\to aAb \bullet \\ B &\to \bullet ab& B &\to a \bullet b& B &\to ab \bullet \end{align*} Then $a$, $aa$, and $aab$ are viable prefixes (though not the only ones). Obviously, $B \to ab \bullet$ was valid for $aab$, and is also complete, since we were able to use it in a reduce step. But both $B \to a \bullet b$ and $B \to a \bullet Ab$ are valid for $aa$.

So how do we figure out what the valid items for viable prefixes are? It turns out the set of viable prefixes for any context-free grammar is a regular language. Not only that, but the states of such a DFA are such that if $\gamma$ is a valid prefix and is read by the DFA, then it reaches a state labelled by all of the items valid for $\gamma$.

Let $G$ be a context-free grammar. Then there exists a DFA that accepts the set of viable prefixes of $G$ such that for a valid prefix $\gamma$, $\delta(q_0,\gamma)$ is labelled with an item $A \to \alpha \bullet \beta$ if and only if it is valid for $\gamma$.

This DFA is called the Knuth DFA. The idea here is that if we are reading a prefix and we know we have an item $A \to \alpha \bullet X B \beta$ that's valid for it, we can probably figure out what the valid items are if we keep on reading the prefix. For instance, if we see $X$ next, then we know that $A \to \alpha X \bullet B \beta$ is a candidate, as well as $B \to \bullet \rho$, since we could be in the sentential form where we replaced $B$ with $\rho$.

Here is the Knuth DFA for our grammar. Note that there is a sink state that isn't shown (otherwise, the DFA would be incomplete) but that every other state is accepting. So if the input word doesn't get thrown into the sink state, it's a viable prefix.

The key here is that we want complete items, or those which we'll be using to perform a reduce step, to be in a state on their own. In other words, there's no chance that our parser will need to make a choice between two different reduce actions. In this way, we're guaranteed exactly one rightmost derivation and we're guaranteed that the parse is deterministic.

Now, recall our hypothetical DPDA. We had a setup where we could view $aB \bullet babc$ as the state of our machine—as a configuration, it would look like $(q, babc, Ba)$ for a state $q$. With the additional knowledge that we've just gained, in addition to storing a viable prefix, we will have the stack store states of the Knuth DFA between each symbol of our viable prefix, essentially tracing out a computation in the DFA.

So our DPDA on an input word $w = a_1 \cdots a_n$ would have a configuration \[(q, a_t a_{t+1} \cdots a_n, q_k X_k q_{k-1} X_{k-1} \cdots q_1 X_1 q_0)\] such that $X_1 \cdots X_k a_t \cdots a_n$ is our current right sentential form and $q_j = \delta(q_{j-1},X_j)$ from the Knuth DFA. Note that the item on the top of our stack is always a state of the Knuth DFA. Whether we perform a shift or reduce depends on whether the state contains a complete item.

In this case, that state is $q_k$. If it doesn't, then we perform a shift, and we would have the configuration \[(q, a_{t+1} \cdots a_n, q_t a_t q_k X_k q_{k-1} X_{k-1} \cdots q_1 X_1 q_0)\] with $q_t = \delta(q_k,a_t)$. However, suppose that we did have a complete item $A \to \alpha \bullet$ in the state $q_k$. Then we would perform a reduce. If $\alpha = X_{\ell+1} \cdots X_k$ and $p = \delta(q_\ell, A)$, we have \[(q, a_t \cdots a_n, p A q_\ell X_\ell q_{\ell-1} \cdots q_1 X_1 q_0).\]

Finally, we know that we have a word in our language if we have a state that contains the item $S \to \alpha \bullet$ and we can empty the stack and accept.

Knuth gives the following pair of results which links $\mathrm{LR}(k)$ grammars and DCFLs.

If $G$ is an $\mathrm{LR}(k)$ grammar, then there is a deterministic pushdown automaton that accepts $L(G)$. If $L$ is a deterministic context-free language, then there is an $\mathrm{LR}(1)$ grammar that generates $L$.

Practically speaking, most programming languages are really languages that are $\mathrm{LR}(1)$ or so. YACC generates grammars that are $\mathrm{LALR}$, which is somewhere in between $\mathrm{LR}(0)$ and $\mathrm{LR}(1)$.

CMSC 28000 — Lecture 17

Deterministic Context-Free Languages

Linear-Time Parsing