CMSC 28000 — Lecture 4

Closure properties of DFA languages

Recall that languages are just sets of words and that we can perform set operations on languages. However, once we start thinking about computation, particularly with our simple model of computation, an important question is whether the resulting language that we get from applying these operations is still computable.

For instance, suppose we have two DFA languages $L_1$ and $L_2$. We know that they have to be recognized by DFAs. Can we conclude that $L_1 \cup L_2$ can also be recognized by a DFA?

We say that a class of languages is closed under an operation if whenever the arguments of the operation are in the class, the result is also.

Let's consider the DFA languages. If we want to show that an operation is closed under the DFA languages, we have to show that the result of the operation can be recognized by a DFA. The most straightforward way of doing this is to take the input DFAs and use them to construct a DFA for the result.

First, let's do a simple, but useful, example.

The DFA languages are closed under complement.

Proof. Let $L \subseteq \Sigma^*$ be a DFA language. Since $L$ is DFA, it is recognized by a DFA $A = (Q,\Sigma^*,\delta,q_0,F)$. We will construct a DFA $A'$ that recognizes $\overline L$ as follows: let $A' = (Q,\Sigma,\delta,q_0,Q \setminus F)$.

In other words, we take the DFA $A$ and swap the set of non-final and final states. Then we have for all $w \in \Sigma^*$, \begin{align*} w \in L(A') &\iff \delta(q_0,w) \in Q \setminus F \\ &\iff w \not \in L(A) \\ &\iff w \not \in L \\ &\iff w \in \overline L \end{align*} $\tag*{$\Box$}$

If we think back to languages as decision problems, then this gives us a handy way to solve the complement of some problems. For instance, we now have a very easy way to determine whether a number isn't divisible by 3.

Obviously, we are very interested in the closure properties for the basic set operations (union, intersection, complement) and the basic string operations (concatenation, Kleene star). These give us ways to combine and solve problems that we already know how to solve. First, let's consider the intersection.

Suppose we have two DFA languages, $L_1$ and $L_2$. Since they're DFA, we know that they can be recognized by some DFA, say $A_1$ and $A_2$. By definition, if $w \in L_1 \cap L_2$, then this means $w \in L_1$ and $w \in L_2$. Conceptually, all we need to do is run $A_1$ on $w$ and run $A_2$ on $w$ and see if it's accepted by both. This idea of running two machines in parallel leads us to the product construction.

Consider DFAs $A_1 = (Q_1, \Sigma, \delta_1, q_{0,1}, F_1)$ and $A_2 = (Q_2, \Sigma, \delta_2, q_{0,2}, F_2)$. Then the product automaton of $A_1$ and $A_2$ is the DFA $A' = (Q',\Sigma,\delta',q_0',F')$, where

In the product automaton, every state is a pair of states, with one from $A_1$ and the other from $A_2$. This way, we can keep track of the computation on the word $w$ for each of our automata. The set of accepting states for $A'$ are those pairs with an accepting state from each machine. In other words, $A'$ will accept only when $w$ reaches a final state in $A_1$ and a final state in $A_2$.

Let's prove formally that the product construction works. First, we'll show that the transition function behaves pretty much how we'd expect it to.

For all $w \in \Sigma^*$, $$\delta'(\langle p,q \rangle, w) = \langle \delta_1(p,w), \delta_2(q,w) \rangle.$$

We will prove this by induction on $|w|$.

For the base case, we have $w = \varepsilon$. Then $$\delta'(\langle p,q \rangle, \varepsilon) = \langle p,q \rangle = \langle \delta_1(p,\varepsilon, \delta_2(q, \varepsilon) \rangle.$$

Then for our inductive step, we assume that $\delta'(\langle p,q \rangle, x) = \langle \delta_1(p,x), \delta_2(q,x) \rangle$ for $x \in \Sigma^*$ and we consider $w = xa$ for $a \in \Sigma$. We have \begin{align*} \delta'(\langle p,q \rangle, xa) &= \delta'(\delta'(\langle p,q \rangle, x),a) \\ &= \delta'(\langle \delta_1(p,x), \delta_2(q,x) \rangle,a) \\ &= \langle \delta_1(\delta_1(p,x),a), \delta_2(\delta_2(q,x),a) \rangle \\ &= \langle \delta_1(p,xa), \delta_2(q,xa) \rangle \end{align*}

 

Now, we'll prove that our automaton accepts the intersection like we want.

$L(A') = L_1 \cap L_2$.

Let $w \in \Sigma^*$. Then, \begin{align*} w \in L(A') &\iff \delta'(q_0',w) \in F' \\ &\iff \delta'(\langle q_{0,1}, q_{0,2} \rangle, w) \in F_1 \times F_2 \\ &\iff \langle \delta_1(q_{0,1},w), \delta_2(q_{0,2},w) \rangle \in F_1 \times F_2 \\ &\iff \delta_1(q_{0,1},w) \in F_1 \text{ and } \delta_2(q_{0,2},w) \in F_2 \\ &\iff w \in L(A_1) \text{ and } w \in L(A_2) \\ &\iff w \in L_1 \text{ and } w \in L_2 \\ &\iff w \in L_1 \cap L_2 \end{align*}

 

The product construction is a construction due to Rabin and Scott in 1959. We will see this paper pop up again—a lot of fundamental results about finite automata come from here.

Now, let's consider union. If we think about our approach to intersection, we can see that it's not actually that different. Instead of accepting a word when both automata accept it, we can modify our final state set so that a word is accepted when at least one of the automata accept. Then such a DFA follows the product construction from above, with the difference being the set of final states, $$F' = (F_1 \times Q_2) \cup (Q_1 \times F_2).$$ This set consists of all pairs that contain a final state of $A_1$ or a final state of $A_2$.

However, there is another easy but slightly indirect way. We can make use of the closure properties we've already proved so far.

The class of languages recognized by DFAs is closed under union.

Let $L_1$ and $L_2$ be languages recognized by DFAs. Then $\overline{L_1}$ and $\overline{L_2}$ are also recognized by DFAs, by Theorem 3.3. Since DFA languages are closed under intersection, by Theorem 4.2, $\overline{L_1} \cap \overline{L_2}$ is also a DFA language. But this language is $\overline{L_1 \cup L_2}$ by De Morgan's laws. And since DFA languages are closed under complementation, $L_1 \cup L_2$ must also be a DFA language.

We can do the same kind of thing for other similar set operations like set difference. But what about an entirely different kind of operation like concatenation? We can define many more interesting string operations, like concatenation, and ask whether those operations preserve the property of being recognizable by DFAs.

Here are some examples of string and language operations.

Nondeterminism

As DFAs are currently defined (and so named) the transition function $\delta$ is deterministic. That is, the next state is uniquely determined by the current state and input symbol, or there is exactly one state that the transition function points to. This keeps things simple for us and it accurately reflects how real computers work. However, it does seem like a bit of a stifling restriction and sometimes there are a few points where it'd be so much easier just to be able to go to multiple states.

Here, we'll relax the condition that the transition function is deterministic by introducing nondeterminism to finite automata. We call these machines nondeterministic finite automata for obvious reasons. Informally, this means that in an NFA, for each state and transition symbol, we'll allow transitions to multiple states.

Conceptually, what does this mean? In essence, the NFA can "guess" which transition to take. While with a DFA we could represent a computation on an input word with a single path, with nondeterminism, there are multiple paths which we can represent as a tree, with branches forming at the points where there are choices to be made.

But if there are multiple possible paths, how do we define acceptance under this model? If there is at least one path which, after reading the input word, reaches an accepting state, then the NFA accepts the word. This fits with the idea that the NFA makes guesses whenever it has a choice; the existence of a path that reaches a final state means that there is a set of guesses that the NFA could make in order for the word to be accepted. Conversely, the NFA rejects $w$ if there are no paths which reach a final state upon reading $w$.

The following is an example of an NFA, which accepts the language $\{a,b\}^* b \{a,b\}^2$. Note that on $q_0$, there are two transitions on $b$: one to $q_0$ and one to $q_1$.