CMSC 28000 — Lecture 3

	$a$	$b$	$c$
$q_0$	$q_1$	$q_1$	$q_0$
$q_1$	$q_2$	$q_0$	$q_1$
$q_2$	$q_3$	$q_2$	$q_2$
$q_3$	$q_0$	$q_3$	$q_0$

Is there a better way to do this than just enumerating every transition? Yes, and this becomes more necessary as we construct more complicated finite automata. For example, if we generalize the above DFA to a family of DFAs over $n$ states for $n \gt 3$, then we get the following.

Then choosing $n = 4$ gives us the machine we specified above. But in describing our machine this way, we've created a bunch of machines that behave similarly, but for any $n$ we choose. This kind of thing becomes useful once we start doing more complicated things.

Now, we want to also formally define the notion of the set of words that our machine will accept. Intuitively, we have a sense of what acceptance and rejection means. We read a word and trace a path along the automaton and if we end up in an accepting state, the word is accepted and otherwise, the word is rejected.

First of all, it makes sense that we should be able to define this formally by thinking about the conditions that we've placed on the transition function. Since on each state $q$ and on each symbol $a$, we know exactly what state we end up on, it makes sense that each word takes us to a specific state.

So, we can extend the definition of the transition function so that it is a function from a state and a word to a state, $\hat \delta : Q \times \Sigma^* \to Q$.

It might be helpful to note that we can also approach this definition from the other direction. That is, instead of writing $w = ax$, we consider the factorization $w = xa$. Then the inductive part of our definition becomes $$\hat \delta(q,xa) = \delta(\hat \delta(q,x),a).$$

Technically speaking, $\delta$ and $\hat \delta$ are different functions, since one is a function on symbols and the other is a function on words. However, since $\hat \delta$ is a natural extension of $\delta$, it's very easy to forget about the hat and simply write $\delta$ for both functions. This is something I'm not too concerned about the meaning is almost always clear.

A neat observation about $\hat \delta$ is that if you remember your CS 151 or 161, then this is really just a foldl on $\delta$. In other words, $\hat \delta(q,w)$ would be something like (foldl δ q w) (this is okay because you can use Unicode in Racket).

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA. We say $A$ accepts a word $w \in \Sigma^*$ when $\hat \delta(q_0,w) \in F$. Then the language of a DFA $A$ is the set of all words accepted by $A$, $$L(A) = \{w \in \Sigma^* \mid \hat \delta(q_0,w) \in F \}.$$ We also sometimes say that a word or language is recognized by $A$. It is important to note that when we say that a DFA accepts a language $L$, we are saying that the DFA accepts only those words in $L$ and rejects all words not in $L$.

Divisibility

Now, we can begin to think about the kinds of problems that finite automata can solve. As a first example, we'll consider the idea of divisibility. Recall from discrete math that when we divide two integers, we don't always get another integer, which is nice because it spices things up a bit. However, we proved that we always get a unique quotient and remainder.

A constructive proof of the division theorem shows us how to do this, though we've likely employed such an algorithm numerous times throughout our schooling. This was noted by Blaise Pascal in the 1600s who showed that there's an algorithm for this in any base, not just decimal, and only by examining the sum of the digits of the number. This is based on the observation that remainders are essentially cyclic—we understand this structure to be $\mathbb Z_m$, the integers modulo $m$.

Let's consider the language $$ L = \{w \in \{0,1\}^* \mid \text{$w$ is the binary representation of a number that is divisible by 3} \}.$$ In other words, this language describes the decision problem of whether a number is divisible by 3.

Let's build a DFA that recognizes this language. Let $A = (Q,\Sigma,\delta,q_0,F)$, where

Let's think about how this machine should work. The DFA works by reading a string from left to right, which means that it'll be reading the most significant bit first. The natural way to think about this problem, then, is to consider what our input modulo 3 should be. Obviously, we want to accept a string if and only if it corresponds to an integer that's equivalent to $0 \bmod 3$. Keeping this in mind, having each state for the different possibilities modulo 3 seems like a reasonable idea.

There are a few caveats we should consider. First, how should we interpret $\varepsilon$? Since we begin in state 0, we're interpreting $\varepsilon$ as 0. For our purposes, this is fine. But if we didn't want to accept this interpretation, then we would need to add an extra state to handle this case. Similarly, we can ask the question of whether we'd like to allow inputs with leading zeroes and we would have to adjust our DFA accordingly.

Now, suppose we've read a string $w$ and now we want to read either a $0$ or a $1$. We just have to remember a bit about binary math. Let $(w)_2$ denote the integer that $w$ represents in binary. Then, $$(w0)_2 = 2 \cdot (w)_2 \quad \text{and} \quad (w1)_2 = 2 \cdot (w)_2 + 1.$$ With this in mind, we just need to think about how reading a new bit will change our result modulo 3. We can summarize the changes to $(w)_2 \bmod 3$ in the following table:

Now, this looks suspiciously like a transition table. If $q = (w)_2 \bmod 3$ is the state that we're in, then we have $\delta(q,a) = 2 \cdot q + a \bmod 3$ and this corresponds exactly with our definition of $A$ from above. Let's prove this formally.

	$0$	$1$
$0$	$0$	$1$
$1$	$2$	$0$
$2$	$1$	$2$

$(w)_2 \bmod 3$	$(w0)_2 \bmod 3$	$(w1)_2 \bmod 3$
$0$	$0$	$1$
$1$	$2$	$0$
$2$	$1$	$2$

Let $(w)_2$ denote the integer represented by the binary string $w$. We claim that a string $w$ reaches state $i$ if and only if $(w)_2 = i \bmod 3$, for $i = 0,1,2$. We will show this by induction on the length of $w$.

First, as our base case, let $w = \varepsilon$. Then $\delta(0,\varepsilon) = 0$ and we have $(\varepsilon)_2 = 0 = 0 \bmod 3$.

Now suppose our claim holds for $x \in \Sigma^*$. Let $w = xa$, where $a \in \Sigma$. We have \begin{align*} \delta(0,xa) &= \delta(\delta(0,x),a) \\ &= \delta((x)_2 \bmod 3, a) &\text{ind. hyp.} \\ &= (2 \cdot ((x)_2 \bmod 3) +a) \bmod 3 \\ &= (2 \cdot (x)_2 + a) \bmod 3 \\ &= (xa)_2 \bmod 3 \end{align*}

Thus, we have shown that $w$ reaches a state $i$ if and only if $(w)_2 = i \bmod 3$. Since $0$ is the only final state of $A$, we can conclude that $A$ accepts $w$ if and only if $(w)_2$ is divisible by 3.

From this, one might ask whether it's possible to construct a DFA that recognizes all of the binary strings that represent integers that are divisible by $n$ for any arbitrary integer $n$. The answer is yes, and it's not hard to see why. There are only finitely many equivalence classes modulo $n$ (or rather, exactly $n$) and all we need to do is keep track of how this number changes with each additional symbol read. You can even take this approach and make it work for strings representing integers in any base $k$.

This captures the kinds of problems that finite automata can solve. The key property of finite automata is that they can only keep track of a finite amount of information. This information doesn't scale with the size of the input—even with a finite amount of information, we're able to describe an infinite set like the integers modulo 3. In the language of complexity theory, these are problems that have space complexity $O(1)$. What kinds of problems can we solve with only a finite amount of information?