CMSC 28000 — Lecture 3

Now, we can begin to think about the kinds of problems that finite automata can solve. As a first example, we'll consider the language $$ L = \{w \in \{0,1\} \mid \text{$w$ is the binary representation of a number that is divisible by 3} \}.$$ In other words, this language describes the decision problem of whether a number is divisible by 3.

Let's build a DFA that recognizes this language. Let $A = (Q,\Sigma,\delta,q_0,F)$, where

Let's think about how this machine should work. The DFA works by reading a string from left to right, which means that it'll be reading the most significant bit first. The natural way to think about this problem, then, is to consider what our input modulo 3 should be. Obviously, we want to accept a string if and only if it corresponds to an integer that's equivalent to $0 \bmod 3$. Keeping this in mind, having each state for the different possibilities modulo 3 seems like a reasonable idea.

There are a few caveats we should consider. First, how should we interpret $\varepsilon$? Since we begin in state 0, we're interpreting $\varepsilon$ as 0. For our purposes, this is fine. But if we didn't want to accept this interpretation, then we would need to add an extra state to handle this case. Similarly, we can ask the question of whether we'd like to allow inputs with leading zeroes and we would have to adjust our DFA accordingly.

Now, suppose we've read a string $w$ and now we want to read either a $0$ or a $1$. We just have to remember a bit about binary math. Let $(w)_2$ denote the integer that $w$ represents in binary. Then, $$(w0)_2 = 2 \cdot (w)_2 \quad \text{and} \quad (w1)_2 = 2 \cdot (w)_2 + 1.$$ With this in mind, we just need to think about how reading a new bit will change our result modulo 3. We can summarize the changes to $(w)_2 \bmod 3$ in the following table:

Now, this looks suspiciously like a transition table. If $q = (w)_2 \bmod 3$ is the state that we're in, then we have $\delta(q,a) = 2 \cdot q + a \bmod 3$ and this corresponds exactly with our definition of $A$ from above. Let's prove this formally.

	$0$	$1$
$0$	$0$	$1$
$1$	$2$	$0$
$2$	$1$	$2$

$(w)_2 \bmod 3$	$(w0)_2 \bmod 3$	$(w1)_2 \bmod 3$
$0$	$0$	$1$
$1$	$2$	$0$
$2$	$1$	$2$

Let $(w)_2$ denote the integer represented by the binary string $w$. We claim that a string $w$ reaches state $i$ if and only if $(w)_2 = i \bmod 3$, for $i = 0,1,2$. We will show this by induction on the length of $w$.

First, as our base case, let $w = \varepsilon$. Then $\delta(0,\varepsilon) = 0$ and we have $(\varepsilon)_2 = 0 = 0 \bmod 3$.

Now suppose our claim holds for $x \in \Sigma^*$. Let $w = xa$, where $a \in \Sigma$. We have \begin{align*} \delta(0,xa) &= \delta(\delta(0,x),a) \\ &= \delta((x)_2 \bmod 3, a) &\text{ind. hyp.} \\ &= (2 \cdot ((x)_2 \bmod 3) +a) \bmod 3 \\ &= (2 \cdot (x)_2 + a) \bmod 3 \\ &= (xa)_2 \bmod 3 \end{align*}

Thus, we have shown that $w$ reaches a state $i$ if and only if $(w)_2 = i \bmod 3$. Since $0$ is the only final state of $A$, we can conclude that $A$ accepts $w$ if and only if $(w)_2$ is divisible by 3.

From this, one might ask whether it's possible to construct a DFA that recognizes all of the binary strings that represent integers that are divisible by $n$ for any arbitrary integer $n$. The answer is yes, and it's not hard to see why. There are only finitely many equivalence classes modulo $n$ (or rather, exactly $n$) and all we need to do is keep track of how this number changes with each additional symbol read. You can even take this approach and make it work for strings representing integers in any base $k$.

Closure properties of regular languages

Recall that languages are just sets of words and that we can perform set operations on languages. However, once we start thinking about computation, particularly with our simple model of computation, an important question is whether the resulting language that we get from applying these operations is still computable.

For instance, suppose we have two regular languages $L_1$ and $L_2$. We know that they have to be recognized by DFAs. Can we conclude that $L_1 \cup L_2$ can also be recognized by a DFA?

We say that a class of languages is closed under an operation if whenever the arguments of the operation are in the class, the result is also.

Let's consider the regular languages. If we want to show that an operation is closed under the regular languages, we have to show that the result of the operation can be recognized by a DFA. The most straightforward way of doing this is to take the input DFAs and use them to construct a DFA for the result.

Proof. Let $L \subseteq \Sigma^*$ be a regular language. Since $L$ is regular, it is recognized by a DFA $A = (Q,\Sigma^*,\delta,q_0,F)$. We will construct a DFA $A'$ that recognizes $\overline L$ as follows: let $A' = (Q,\Sigma,\delta,q_0,Q \setminus F)$.

In other words, we take the DFA $A$ and swap the set of non-final and final states. Then we have for all $w \in \Sigma^*$, \begin{align*} w \in L(A') &\iff \delta(q_0,w) \in Q \setminus F \\ &\iff w \not \in L(A) \\ &\iff w \not \in L \\ &\iff w \in \overline L \end{align*} $\tag*{$\Box$}$

If we think back to languages as decision problems, then this gives us a handy way to solve the complement of some problems. For instance, we now have a very easy way to determine whether a number isn't divisible by 3.

Obviously, we are very interested in the closure properties for the basic set operations (union, intersection, complement) and the basic string operations (concatenation, Kleene star). These give us ways to combine and solve problems that we already know how to solve. First, let's consider the intersection.

Suppose we have two regular languages, $L_1$ and $L_2$. Since they're regular, we know that they can be recognized by some DFA, say $A_1$ and $A_2$. By definition, if $w \in L_1 \cap L_2$, then this means $w \in L_1$ and $w \in L_2$. Conceptually, all we need to do is run $A_1$ on $w$ and run $A_2$ on $w$ and see if it's accepted by both. This idea of running two machines in parallel leads us to the product construction.

Consider DFAs $A_1 = (Q_1, \Sigma, \delta_1, q_{0,1}, F_1)$ and $A_2 = (Q_2, \Sigma, \delta_2, q_{0,2}, F_2)$. Then the product automaton of $A_1$ and $A_2$ is the DFA $A' = (Q',\Sigma,\delta',q_0',F')$, where

In the product automaton, every state is a pair of states, with one from $A_1$ and the other from $A_2$. This way, we can keep track of the computation on the word $w$ for each of our automata. The set of accepting states for $A'$ are those pairs with an accepting state from each machine. In other words, $A'$ will accept only when $w$ reaches a final state in $A_1$ and a final state in $A_2$.

Let's prove this formally. First, we'll show that the transition function behaves pretty much how we'd expect it to.

For all $w \in \Sigma^*$, $$\delta'(\langle p,q \rangle, w) = \langle \delta_1(p,w), \delta_2(q,w) \rangle.$$

We will prove this by induction on $|w|$.

For the base case, we have $w = \varepsilon$. Then $$\delta'(\langle p,q \rangle, \varepsilon) = \langle p,q \rangle = \langle \delta_1(p,\varepsilon, \delta_2(q, \varepsilon) \rangle.$$

Then for our inductive step, we assume that $\delta'(\langle p,q \rangle, x) = \langle \delta_1(p,x), \delta_2(q,x) \rangle$ for $x \in \Sigma^*$ and we consider $w = xa$ for $a \in \Sigma$. We have \begin{align*} \delta'(\langle p,q \rangle, xa) &= \delta'(\delta'(\langle p,q \rangle, x),a) \\ &= \delta'(\langle \delta_1(p,x), \delta_2(q,x) \rangle,a) \\ &= \langle \delta_1(\delta_1(p,x),a), \delta_2(\delta_2(q,x),a) \rangle \\ &= \langle \delta_1(p,xa), \delta_2(q,xa) \rangle \end{align*}

CMSC 28000 — Lecture 3

Divisibility

Closure properties of regular languages