CMSC 28000 — Lecture 11

DFA Minimization

The proof of the Myhill-Nerode theorem defines a DFA $A_{\equiv_L}$ for a regular language $L$. If we recall that the Myhill-Nerode equivalence classes are about distinguishable prefixes with respect to $L$, this suggests that a DFA recognizing $L$ will need at least as many states as $L$ has Myhill-Nerode equivalence classes. But these equivalence classes correspond to the states of $A_{\equiv_L}$, which means that any DFA that recognizes $L$ must have at least as many states as $A_{\equiv_L}$.

For example, recall that we showed that we can use the product construction to obtain a DFA that recognizes the intersection of two regular languages. If the two languages have DFAs of size $m$ and $n$, then the product construction gives a DFA with up to $mn$ states. But do we really need that many? We can show that there exist languages for which we do need all of those states.

For all $m,n \geq 1$, there exist regular languages $L_m$ and $L_n$ recognized by a DFA with $m$ and $n$ states, respectively, such that a DFA that recognizes $L_m \cap L_n$ must have at least $mn$ states.

Let $\Sigma = \{0,1\}$ and consider the languages \begin{align*} L_m &= \{w \in \Sigma^* \mid |w|_0 = 0 \pmod m\}, \\ L_n &= \{w \in \Sigma^* \mid |w|_1 = 0 \pmod n\}. \end{align*} Then $L_m$ can be recognized by a DFA with $m$ states and $L_n$ can be recognized by a DFA with $n$ states (you can try to find the DFAs yourselves as an easy exercise). Then $$L_m \cap L_n = \{w \in \Sigma^* \mid |w|_0 = 0 \pmod m, |w|_1 = 0 \pmod n\}.$$ So we need to consider the equivalence classes of $\equiv_{L_m \cap L_n}$.

We will consider strings $0^i 1^j$ with $0 \leq i \lt m$ and $0 \leq j \lt n$ to be our representatives for the equivalence classes. To show that these are all distinct, we consider two words $x = 0^i 1^j$ and $y = 0^k 1^\ell$ with $0 \leq i, k \lt m$, $0 \leq j, \ell \lt n$, and $x \neq y$. This means that $i \neq k$ or $j \neq \ell$ (or both). We suppose $i \neq k$; the $j \neq \ell$ case follows a similar argument.

We choose the word $z = 0^{m-i} 1^{n-\ell}$. Then $xz \in L_m \cap L_n$ but $yz \not \in L_m \cap L_n$. To see this, the number of $0$s in $xz$ is $i + m - i = 0 \pmod m$ and the number of $1$s in $xz$ is $j + n - j = 0 \pmod n$. But the number of $0$s in $yz$ is $k + m - i \neq 0 \pmod m$ if $k \neq i$. So $z$ distinguishes $x$ and $y$.

This means that there are at least $mn$ pairwise distinguishable Myhill-Nerode equivalence classes ($i$ can range from 0 to $m-1$ and $j$ can range from 0 to $n-1$).

Since we know that the product construction has at most $mn$ states, together with this result, we've also just shown that there's a DFA for $L_m \cap L_n$ that has exactly $mn$ states.

This leads to the notion of a minimal DFA, in terms of the number of states it has, for a language. So it's clear from this that any minimal DFA will have as many states as there are Myhill-Nerode equivalence classes for its language, but we can actually say something stronger than that: the Myhill-Nerode DFA is the only minimal DFA for its language.

Let $L$ be a regular language. The DFA $A_{\equiv_L}$ is the unique minimal DFA for $L$ up to isomorphism.

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a minimal DFA for $L$. Recall that the equivalence classes of the relation $\sim_A$ correspond to states of $A$. By Lemma 10.11, $\sim_A$ refines $\equiv_L$ and therefore, the index of $\sim_A$ must be at least the index of $\equiv_L$. Since we assumed $A$ was minimal, this means that $A$ has exactly as many states as $A_{\equiv_L}$.

Now, we need to show that there is a correspondence between states of $A$ and states of $A_{\equiv_L}$. Let $q \in Q$. Then $q$ must be reachable from the initial state $q_0$, since otherwise, we can remove it. This means there exists a word $w \in \Sigma^*$ such that $\delta(q_0,w) = q$. Then $q$ corresponds to the equivalence class for $[w]$.

Before getting to how to actually compute this, it is worth noting that the existence of a minimal DFA and the fact that a regular language has a unique minimal DFA gives us a way to test whether two DFAs or NFAs or regular expressions correspond to the same language: just turn the two objects into a minimal DFA and if they're the same, then they accept the same language.

This brings us to the question of how to compute the minimal DFA. Theorem 11.1 gives us the basic idea: combining states that are indistinguishable.

Consider a DFA $A = (Q,\Sigma,\delta,q_0,F)$ and two states $p,q \in Q$. We say that $p$ and $q$ are indistinguishable if for all $w \in \Sigma^*$, $\delta(p,w) \in F \iff \delta(q,w) \in F$.

In this case, $p$ and $q$ must correspond to the same Myhill-Nerode equivalence class and we can replace both states with a single state. This leads to the following simple algorithm.

  1. Initialize a table for (unordered) pairs $\{p,q\}$.
  2. For all pairs $\{p,q\}$ with $p \in F$ and $q \not \in F$, mark the corresponding entry.
  3. Repeat until there are no more changes: For each unmarked $\{p,q\}$, for each symbol $a \in \Sigma$, if the pair $\{\delta(p,a), \delta(q,a)\}$ has been marked, mark $\{p,q\}$.
  4. The unmarked pairs are equivalent.

The automaton that we get out of this is called the quotient automaton, because if you think about our automaton as an algebraic object (which it is), then we're essentially performing a quotient construction on it. Generally speaking, we could refer to any equivalence relation, but the one that matters the most in our case is $\equiv_L$, the Myhill-Nerode relation for the language of the machine. And so we can refer to this as $A/\equiv_L$ if we want.

Let's try an example. We have the following DFA.

First, we set up our table and consider the first iteration of our algorithms. Note that since pairs are unordered, we only fill half the table, so to distinguish unmarked cells from empty cells, we use -. Marked pairs of states are marked by which round of iteration they were marked in, for demonstrative purposes.

$q_1$ $q_2$ $q_3$ $q_4$ $q_5$
$q_0$ 1 1 - - 1
$q_1$ - 1 1 -
$q_2$ 1 1 -
$q_3$ - 1
$q_4$ 1

Next, we consider each pair of unmarked states and determine whether each pair reaches a marked pair. We have that $(q_1,q_5)$ and $(q_2,q_5)$ get marked in this phase, because $(\delta(q_1,a), \delta(q_5,a)) = (q_3,q_5)$ and $(\delta(q_2,a), \delta(q_5,a)) = (q_4,q_5)$, which have been marked distinguishable.

The reasoning here is simple: suppose that a pair wasn't distinguishable. Then, because the reaching a state relation $\sim_A$ is right-invariant, reading any word from the two states should get us to indistinguishable states. But clearly, this isn't the case, since reading $a$ brings us to a pair of distinguishable states. It's not too hard to see that we can extend this argument inductively.

$q_1$ $q_2$ $q_3$ $q_4$ $q_5$
$q_0$ 1 1 - - 1
$q_1$ - 1 1 2
$q_2$ 1 1 2
$q_3$ - 1
$q_4$ 1

We do another round of this, taking into account the new pairs of distinguishable states.

$q_1$ $q_2$ $q_3$ $q_4$ $q_5$
$q_0$ 1 1 3 3 1
$q_1$ - 1 1 2
$q_2$ 1 1 2
$q_3$ - 1
$q_4$ 1

At this point, we're finished because and we have two pairs of states that are indistinguishable. We can merge these states safely to get the following DFA.

This algorithm is something like $O(|\Sigma| n^3)$ worst-case time complexity, where $n$ is the number of states in your DFA. If you are slightly more clever, we can get this down to $O(|\Sigma| n^2)$, by maintaining a list of pairs $\{p',q'\}$ for each pair $\{p,q\}$, with $p' = \delta(p,a)$ and $q' = \delta(q',a)$ for some $a \in \Sigma$. Then, once the lists are built, we can mark pairs by going through the list in one go.

By virtue of being a polynomial-time algorithm, we've shown that we can consider DFA minimization to be "efficient". So we can take for granted that we can efficiently determine whether two DFAs recognize the same language. For NFAs and regular expressions, things are a bit harder because the determinization process can lead to exponential state blowups.

However, there are algorithms that arguably do better. The algorithm with the best worst-case time complexity is Hopcroft's partition refinement algorithm which has $O(n \log n)$ time complexity. The other candidate for a better algorithm is an algorithm due to Brzozowski, and while it has worst-case exponential time, it seems to work as well as Hopcroft's algorithm in practice in many cases and is surprisingly simple to describe.

The DFA $\mathcal D((\mathcal D(A^R))^R)$ is a minimal DFA equivalent to $A$.

This gives us the algorithm: take your DFA $A$, reverse it, then determinize the result, reverse the determinized machine, then determinize the result of that. But why does this work?

First, we need to define the reversal of an automaton. Since you were asked to prove that you can generate the reversal of a language described by a regular expression, you will know that the regular languages are closed under reversal. A very easy (almost too easy) construction works for finite automata.

Given a DFA $A = (Q,\Sigma,\delta,q_0,F)$, we define the NFA $A^R = (Q,\Sigma,\delta^R,F,\{q_0\})$ to be the NFA that is obtained when we reverse the arrows of $A$ and swap the initial and final states. The transition function is defined by $$\delta^R(q,a) = \{p \in Q \mid \delta(p,a) = q\}.$$

Recall that for an NFA $N$, the DFA $\mathcal D(N)$ is the DFA obtained by applying the subset construction to $N$.

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA accepting $L$ and suppose that every state of $A$ is reachable. Then $\mathcal D(A^R)$ is a minimal DFA for $L^R$.

To show that the DFA $\mathcal D(A^R)$ is minimal, we show that no two states are equivalent. Let $\mathcal D(A^R) = (Q',\Sigma,\delta',q_0',F')$ and consider two states $X,Y \in Q'$. Remember that states of $Q'$ are subsets of states of $Q$, the state set of $A$.

Suppose $X$ and $Y$ are equivalent. First, we'll show that $X \subseteq Y$. Let $q \in X$. Since every state of $A$ is reachable, we know that there exists a word $w \in \Sigma^*$ such that $\delta(q_0,w) = q$. Then $q_0 \in \delta^R(q,w^R)$ in $A^R$ and so $\delta'(X,w^R) \in F'$ in $\mathcal D(A^R)$.

Since $X$ and $Y$ are equivalent, we have that $\delta'(Y,w^R) \in F'$ in $\mathcal D(A^R)$. Then there exists a $p \in Y$ such that $q_0 \in \delta^R(p,w^R)$ in $A^R$, which means that $\delta(q_0,w) = p$ in $A$. But $A$ is a DFA, so we must have $p = q$ and therefore $q \in Y$.

A symmetric argument will show that $Y \subseteq X$ and therefore, $X = Y$.

Applying this lemma twice gives us Brzozowski's algorithm.

The complexity of this algorithm has been studied over the last few decades, both experimentally and through average-case analysis. Part of the challenge of doing such analysis is that it's difficult to generate random DFAs and NFAs with a reasonable distribution and that behave in a reasonable way. For instance, just randomly generating DFAs with $n$ states by randomly generating transitions between them will result in a lot of disconnected DFAs.