CMSC 28000 — Lecture 2

Let $\Sigma = \{a,b\}$. Then $L = \{ab, bc\}$ is a language over $\Sigma$. If we take the Kleene star of $L$, we get the language made of up of combinations of $ab$ and $bc$, like $ababbcabbcbcab$.

One use of this is for constructing codes. In formal language theory, a code over an alphabet $\Sigma$ is a language $K \subseteq \Sigma^+$ such that if $w \in K^+$ then $w$ has a unique factorization into words of $K$ (here is a use of "factor" as "subword"). It's not hard to see that not every language is a code.

Problems, revisited

Suppose you are given a word $w$ and a language $L$. The most basic question that we can ask is whether $w$ is a member of $L$. Now, if we consider one of our problems from earlier, $$L = \{x \in \{0,1\}^* \mid \text{$x$ is the binary representation of a prime number} \},$$ then we get the following implication: if we can come up with a way to solve the language membership problem for this language, then we've come up with a way to determine whether a given integer is a prime number or not.

Considering one of our other problems, we have $$\{x \in \{0,1\}^* \mid \text{$x$ is the encoding of a graph $G$ and vertices $u,v$ such that $G$ contains a path between $u$ and $v$} \}.$$

This makes use of the following assumptions:

  1. We can encode a graph $G$ as a binary string. This is pretty simple, since we can just list the vertices and edges of $G$ as a string and convert that into binary.
  2. We can encode multiple pieces of information. Here, we assume that $x$ contains the following information: a graph $G$ and the label of two vertices $s$ and $t$. We just need to choose a delimiter (commas, spaces, whatever).

Again, we arrive at the conclusion that if we can figure out whether a binary string belongs to this language, we've basically solved this problem.

At this point, you can probably see where we're going with this: decision problems and languages are one and the same. In fact, it is assumed in the literature that decision problems and languages are essentially interchangeable. This may still seem a bit unusual, but recall that one of the early lessons of computer science is that everything we do on a computer ultimately boils down to 0s and 1s at the machine level. This has to do with the physical realities of computers and electronics and such, but it turns out things are not so different in the world of theory.

We have enough now to formally define what a decision problem is.

Let $\Sigma$ be an alphabet. A decision problem is a Boolean function $f:\Sigma^* \to \{0,1\}$ that maps inputs over $\Sigma$ to a single bit. The language associated with a decision problem $f$ is $$L_f = \{x \in \Sigma^* \mid f(x) = 1\}.$$

From this, we can begin to see how formal language theory plays a vital part in our study of computation. In this setting, computational models are nothing more than objects that we can use to describe languages and, by extension, solve problems. In describing $L_f$, what we're really doing is showing how to compute the function $f$, thereby solving the decision problem.

Finite automata

Finite automata are arguably the simplest model of computation. Finite automata first appeared as far back as the 1940s in the context of early neural networks. The idea was that we wanted to represent an organism or robot of some kind responding to some stimulus, which we can also call events. Upon the occurrence of an event, the robot would change its internal state in response and there would be a finite number of possible such states. Today, the use of finite automata for modelling such systems is still around in a branch of control theory in engineering under the name discrete event systems. But all of this has nothing to do with how we're going to consider finite automata.

First, we'll introduce the idea of a "machine" as used in the theory of computation, which we'll be coming back to throughout the course. A machine has an input tape, divided into cells, with an input word written on it. The machine has a tape head that can scan the input tape, reading a symbol from each cell.

The machine also contains a number of states, and depending on the symbol that is read by the tape head, the machine's internal state will change. Upon reading the entire input, the machine accepts or rejects the input word based on the state that the machine is in. We call these finite automata because they have only a finite number of states. It wasn't stated explicitly, but based on the description, we also assume that the tape head can only move in one direction. Because of this restriction, it's not really necessary to think about the tape portion of the machine, at least for now.

We can represent finite automata by drawing the important part of the machine: the state diagram. We draw each state as a circle and label it as appropriate. Transitions are represented as arrows, leaving one state and entering another, labelled by the appropriate input symbol. We denote accepting states by a double circle and we denote the start state by an arrow that comes in from nowhere in particular.

We will give a formal definition for finite automata.

A deterministic finite automaton (DFA) is defined as a 5-tuple $A = (Q,\Sigma,\delta,q_0,F)$, where

Note that by our definition, $\delta$ must be defined for every state and input symbol. We call this a deterministic finite automaton because the transition function $\delta$ is uniquely determined by the state and input symbol.

From our example above, we can list all the parts that we've specified in our formal definition:

and $\delta$ is given by \begin{align*} \delta(q_0,a) &= q_1 & \delta(q_0,b) &= q_1 & \delta(q_0,c) = q_0 \\ \delta(q_1,a) &= q_2 & \delta(q_1,b) &= q_0 & \delta(q_1,c) = q_1 \\ \delta(q_2,a) &= q_3 & \delta(q_2,b) &= q_2 & \delta(q_2,c) = q_2 \\ \delta(q_3,a) &= q_0 & \delta(q_3,b) &= q_3 & \delta(q_3,c) = q_0 \end{align*}

The transition function can also naturally be described as a table,

$a$ $b$ $c$
$q_0$ $q_1$ $q_1$ $q_0$
$q_1$ $q_2$ $q_0$ $q_1$
$q_2$ $q_3$ $q_2$ $q_2$
$q_3$ $q_0$ $q_3$ $q_0$

Is there a better way to do this than just enumerating every transition? Yes, and this becomes more necessary as we construct more complicated finite automata. For example, if we generalize the above DFA to a family of DFAs over $n$ states for $n \gt 3$, then we get the following.

and $\delta$ is described as follows: \begin{align*} \delta(q,a) &= q+1 \pmod n, \\ \delta(q,b) &= \begin{cases} 1 & \text{if $q = 0$,} \\ 0 & \text{if $q = 1$,} \\ q & \text{if $q \in \{2, 3, \dots, n-1\}$,} \end{cases} \\ \delta(q,c) &= \begin{cases} q & \text{if $q \in \{0, 1, \dots, n-2\}$,} \\ 0 & \text{if $q = n-1$.} \end{cases} \end{align*}

This describes DFAs that look like this:

Then choosing $n = 4$ gives us the machine we specified above. But in describing our machine this way, we've created a bunch of machines that behave similarly, but for any $n$ we choose. This kind of thing becomes useful once we start doing more complicated things.

Now, we want to also formally define the notion of the set of words that our machine will accept. Intuitively, we have a sense of what acceptance and rejection means. We read a word and trace a path along the automaton and if we end up in an accepting state, the word is accepted and otherwise, the word is rejected.

First of all, it makes sense that we should be able to define this formally by thinking about the conditions that we've placed on the transition function. Since on each state $q$ and on each symbol $a$, we know exactly what state we end up on, it makes sense that each word takes us to a specific state.

So, we can extend the definition of the transition function so that it is a function from a state and a word to a state, $\hat \delta : Q \times \Sigma^* \to Q$.

The function $\hat \delta : Q \times \Sigma^* \to Q$ is defined by

It might be helpful to note that we can also approach this definition from the other direction. That is, instead of writing $w = xa$, we consider the factorization $w = ax$. Then the inductive part of our definition becomes $$\hat \delta(q,w) = \hat \delta(\delta(q,a),x).$$

Technically speaking, $\delta$ and $\hat \delta$ are different functions, since one is a function on symbols and the other is a function on words. However, since $\hat \delta$ is a natural extension of $\delta$, it's very easy to forget about the hat and simply write $\delta$ for both functions. This is something I'm not too concerned about the meaning is almost always clear.

A neat observation about $\hat \delta$ is that if you remember your CS 151 or 161, then this is really just a foldl on $\delta$. In other words, $\hat \delta(q,w)$ would be something like (foldl δ q w) (this is okay because you can use Unicode in Racket).

Now, we can formally define the acceptance of a word by a DFA.

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA. We say $A$ accepts a word $w \in \Sigma^*$ when $\hat \delta(q_0,w) \in F$. Then the language of a DFA $A$ is the set of all words accepted by $A$, $$L(A) = \{w \in \Sigma^* \mid \hat \delta(q_0,w) \in F \}.$$ We also sometimes say that a word or language is recognized by $A$. It is important to note that when we say that a DFA accepts a language $L$, we are saying that the DFA accepts only those words in $L$ and rejects all words not in $L$.

At this point, I will mention that the class of languages that can be accepted by DFAs is called the class of regular languages, although at this point, it is probably not obvious why we call them regular.