CMSC 28000 — Lecture 12

In order to prove the Myhill–Nerode theorem, we'll first prove a lemma relating the equivalence relation $\sim_A$ for a DFA $A$ with the Myhill–Nerode relation for the language.

Let $L \subseteq \Sigma^*$, and let $\approx$ be any right-invariant equivalence relation on $\Sigma^*$ such that $L$ is the union of equivalence classes of $\approx$. Then $\approx$ refines $\equiv_L$.

Suppose $x \approx y$. Then $xu \approx yu$ for all $u \in \Sigma^*$. Since $L$ is the union of equivalence classes of $\approx$, we have $xu \in L$ if and only if $yu \in L$. Therefore, $x \equiv_L y$.

Since $\sim_A$ is a right-invariant relation and $L$ is the union of equivalence classes of $\sim_A$, it satisfies the lemma. We'll use this fact in the proof of the Myhill–Nerode theorem. We'll restate the theorem so that it's about three conditions rather than just two.

$(1) \implies (2)$: If $L$ is regular, then there exists a DFA $A$ that recognizes $L$. Then $L$ is the union of equivalence classes of $\sim_A$, which is a right invariant equivalence relation of finite index.

$(2) \implies (3)$: By Lemma 12.1, $\sim_A$ refines $\equiv_L$. Since $\sim_A$ is of finite index, and the index of $\equiv_L$ is at most the index of $\sim_A$, $\equiv_L$ must also be of finite index.

$(3) \implies (1)$: We define a DFA $A_{\equiv_L} = (Q,\Sigma,\delta,q_0,F)$ that recognizes $L$, with

$Q = \{[x] \mid x \in \Sigma^*\}$,
$q_0 = [\varepsilon]$,
$F = \{[x] \mid x \in L\}$,
$\delta([x],a) = [xa]$.

Here, the states of our DFA are exactly the equivalence classes of $\equiv_L$. Since $\equiv_L$ has finite index, this means the machine we defined has a finite number of states.

To show that $\delta$ is well-defined for our equivalence classes, suppose that $x \equiv_L y$. Since $\equiv_L$ is right-invariant, we have $xa \equiv_L ya$ for $a \in \Sigma$. Then, $$\delta([x],a) = [xa] = [ya] = \delta([y],a).$$

Then to show that $L(A_{\equiv_L}) = L$, we see that for $w \in \Sigma^*$, \begin{align*} w \in L(A_{\equiv_L}) & \iff \delta([\varepsilon],w) \in F \\ &\iff [w] \in F \\ &\iff w \in L \end{align*}

The Myhill–Nerode theorem is called as such because it was independently proved by John Myhill and Anil Nerode in the late 50s. Myhill was a professor at SUNY Buffalo. Nerode is currently a professor at Cornell, where he's been a faculty member for 60 years. More importantly, he proved this theorem while he was a PhD student here.

There are some significant consequences of the Myhill–Nerode theorem. The first is that it gives us another way to determine whether a language is regular or not.

Let $L = \{a^n b^n \mid n \in \mathbb N\}$. We claim that the equivalence classes of $\equiv_L$ are $$\{[\varepsilon], [a], [aa], \dots\} \cup \{[b],[ab],[aab],\dots\}.$$ We need to show that each of these classes are actually different and so we need to show that they are pairwise distinct. To show that two classes $[u]$ and $[v]$ are different, we show that there exists a word $w$ such that $uw \in L$ and $vw \not \in L$.

This shows that there are at least as many equivalence classes as we've described, but to fully characterize the language, we want to make sure that we're not missing any other possible classes. Consider a possible equivalence class defined by a word $w$. There are two possibilities.

It's clear from this that $L$ has infinitely many equivalence classes and so it can't be regular. However, if we could build an infinite automaton, the Myhill–Nerode equivalence classes would tell us how to do it (we will not do this).

The minimal DFA

The second important consequence of the theorem comes from the proof of the Myhill-Nerode theorem, when it defines a DFA $A_{\equiv_L}$ for a regular language $L$. If we recall that the Myhill-Nerode equivalence classes are about distinguishable prefixes with respect to $L$, this suggests that a DFA recognizing $L$ will need at least as many states as $L$ has Myhill-Nerode equivalence classes. But these equivalence classes correspond to the states of $A_{\equiv_L}$, which means that any DFA that recognizes $L$ must have at least as many states as $A_{\equiv_L}$.

For example, recall that we showed that we can use the product construction to obtain a DFA that recognizes the intersection of two regular languages. If the two languages have DFAs of size $m$ and $n$, then the product construction gives a DFA with up to $mn$ states. But do we really need that many? We can show that there exist languages for which we do need all of those states.

For all $m,n \geq 1$, there exist regular languages $L_m$ and $L_n$ recognized by a DFA with $m$ and $n$ states, respectively, such that a DFA that recognizes $L_m \cap L_n$ must have at least $mn$ states.

Let $\Sigma = \{0,1\}$ and consider the languages \begin{align*} L_m &= \{w \in \Sigma^* \mid |w|_0 = 0 \pmod m\}, \\ L_n &= \{w \in \Sigma^* \mid |w|_1 = 0 \pmod n\}. \end{align*} Then $L_m$ can be recognized by a DFA with $m$ states and $L_n$ can be recognized by a DFA with $n$ states (you can try to find the DFAs yourselves as an easy exercise). Then $$L_m \cap L_n = \{w \in \Sigma^* \mid |w|_0 = 0 \pmod m \wedge |w|_1 = 0 \pmod n\}.$$ So we need to consider the equivalence classes of $\equiv_{L_m \cap L_n}$.

We will consider strings $0^i 1^j$ with $0 \leq i \lt m$ and $0 \leq j \lt n$ to be our representatives for the equivalence classes. To show that these are all distinct, we consider two words $x = 0^i 1^j$ and $y = 0^k 1^\ell$ with $0 \leq i, k \lt m$, $0 \leq j, \ell \lt n$, and $x \neq y$. This means that $i \neq k$ or $j \neq \ell$ (or both). We suppose $i \neq k$; the $j \neq \ell$ case follows a similar argument.

We choose the word $z = 0^{m-i} 1^{n-\ell}$. Then $xz \in L_m \cap L_n$ but $yz \not \in L_m \cap L_n$. To see this, the number of $0$s in $xz$ is $i + m - i = 0 \pmod m$ and the number of $1$s in $xz$ is $j + n - j = 0 \pmod n$. But the number of $0$s in $yz$ is $k + m - i \neq 0 \pmod m$ if $k \neq i$. So $z$ distinguishes $x$ and $y$.

This means that there are at least $mn$ pairwise distinguishable Myhill-Nerode equivalence classes ($i$ can range from 0 to $m-1$ and $j$ can range from 0 to $n-1$).

Since we know that the product construction has at most $mn$ states, together with this result, we've also just shown that there's a DFA for $L_m \cap L_n$ that has exactly $mn$ states.

This leads to the notion of a minimal DFA, in terms of the number of states it has, for a language. So it's clear from this that any minimal DFA will have as many states as there are Myhill-Nerode equivalence classes for its language, but we can actually say something stronger than that: the Myhill-Nerode DFA is the only minimal DFA for its language.

Let $L$ be a regular language. The DFA $A_{\equiv_L}$ is the unique minimal DFA for $L$ up to isomorphism.

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a minimal DFA for $L$. Recall that the equivalence classes of the relation $\sim_A$ correspond to states of $A$. By Lemma 12.1, $\sim_A$ refines $\equiv_L$ and therefore, the index of $\sim_A$ must be at least the index of $\equiv_L$. Since we assumed $A$ was minimal, this means that $A$ has exactly as many states as $A_{\equiv_L}$.

Now, we need to show that there is a correspondence between states of $A$ and states of $A_{\equiv_L}$. Let $q \in Q$. Then $q$ must be reachable from the initial state $q_0$, since otherwise, we can remove it. This means there exists a word $w \in \Sigma^*$ such that $\delta(q_0,w) = q$. Then $q$ corresponds to the equivalence class for $[w]$.

Assuming it's computable, the existence of a minimal DFA and the fact that a regular language has a unique minimal DFA gives us a way to test whether two DFAs or NFAs or regular expressions correspond to the same language: just turn the two objects into a minimal DFA and if they're the same, then they accept the same language.

Luckily, the minimal DFA can be computed, and even better is that it is efficiently computable. Kozen gives an algorithm in Section 14 of the textbook which runs in roughly $O(|\Sigma|n^3)$ time for a DFA with $n$ states. The algorithm with the best worst-case time complexity is Hopcroft's partition refinement algorithm (due to Hopcroft in 1971) which has $O(n \log n)$ time complexity.