CMSC 28000 — Lecture 2

Languages

Informally, languages are just sets of strings and we describe a language in the same way as for any other set.

Here are some examples of languages.

$\{ w \in \{a,b\}^* \mid |w|_a \geq |w|_b \}$
$\{ w \in \Sigma^* \mid w = w^R \}$
$\{ w \in \{0,1\}^* \mid \text{$w$ is the binary representation of a prime number} \}$
$\{ w \in \{A,C,G,T\}^* \mid \text{$ACC$ is not a substring of $w$} \}$

We can define the notion of a language more formally, with respect to an alphabet $\Sigma$.

We denote by $\Sigma^*$ the set of all finite words over $\Sigma$. A language over $\Sigma$ is a subset of $\Sigma^*$.

You might notice that the more curious definition is for $\Sigma^*$. If you recall our brief diversion about monoids, you might recognize that $\Sigma^*$ is really just the free monoid on $\Sigma$.

Since languages are sets, we can perform any set theoretic operations we're familiar with, like union and intersection.

For a language $L \subseteq \Sigma^*$, the complement of $L$ is $\overline L = \Sigma^* \setminus L$.

This also means there are a few things we have to be careful with. For example, there is the notion of the empty language $\emptyset$, which is the language with no strings in it. This is different from the language $\{\varepsilon\}$, which is the language containing the empty string.

We can also extend many operations on strings to operations on languages.

For two languages $L_1$ and $L_2$, we define the catenation of $L_1$ and $L_2$ by $$ L_1 L_2 = \{uv \mid u \in L_1, v \in L_2 \}.$$

This allows us to define powers for languages.

For a language $L \subseteq \Sigma^*$, the $k$th power of $L$ is

$L^0 = \{\varepsilon\}$, and
$L^k = L^{k-1} L$.

This definition leads to an interesting operation.

The Kleene star or Kleene closure of $L$ is defined by $$ L^* = \bigcup_{i \geq 0} L^i.$$ We can similarly define the positive Kleene closure of $L$ by $$ L^+ = \bigcup_{i \geq 1} L^i.$$

Let $\Sigma = \{a,b\}$. Then $L = \{ab, bc\}$ is a language over $\Sigma$. If we take the Kleene star of $L$, we get the language made of up of combinations of $ab$ and $bc$, like $ababbcabbcbcab$.

One use of this is for constructing codes. In formal language theory, a code over an alphabet $\Sigma$ is a language $K \subseteq \Sigma^+$ such that if $w \in K^+$ then $w$ has a unique factorization into words of $K$ (here is a use of "factor" as "subword"). It's not hard to see that not every language is a code.

We have enough now to formally define what a decision problem is.

Let $\Sigma$ be an alphabet. A decision problem is a Boolean function $f:\Sigma^* \to \{0,1\}$ that maps inputs over $\Sigma$ to a single bit. The language associated with a decision problem $f$ is $$L_f = \{x \in \Sigma^* \mid f(x) = 1\}.$$

From this, we can begin to see how formal language theory plays a vital part in our study of computation. In this setting, computational models are nothing more than objects that we can use to describe languages and, by extension, solve problems. In describing $L_f$, what we're really doing is showing how to compute the function $f$, thereby solving the decision problem.

Finite automata

Finite automata are arguably the simplest model of computation. Finite automata first appeared as far back as the 1940s in the context of early neural networks. The idea was that we wanted to represent an organism or robot of some kind responding to some stimulus, which we can also call events. Upon the occurrence of an event, the robot would change its internal state in response and there would be a finite number of possible such states. Today, the use of finite automata for modelling such systems is still around in a branch of control theory in engineering under the name discrete event systems. But all of this has nothing to do with how we're going to consider finite automata.

First, we'll introduce the idea of a "machine" as used in the theory of computation, which we'll be coming back to throughout the course. A machine has an input tape, divided into cells, with an input word written on it. The machine has a tape head that can scan the input tape, reading a symbol from each cell.

The machine also contains a number of states, and depending on the symbol that is read by the tape head, the machine's internal state will change. Upon reading the entire input, the machine accepts or rejects the input word based on the state that the machine is in. We call these finite automata because they have only a finite number of states. It wasn't stated explicitly, but based on the description, we also assume that the tape head can only move in one direction. Because of this restriction, it's not really necessary to think about the tape portion of the machine, at least for now.

We can represent finite automata by drawing the important part of the machine: the state diagram. We draw each state as a circle and label it as appropriate. Transitions are represented as arrows, leaving one state and entering another, labelled by the appropriate input symbol. We denote accepting states by a double circle and we denote the start state by an arrow that comes in from nowhere in particular.

We will give a formal definition for finite automata.

A deterministic finite automaton (DFA) is defined as a 5-tuple $A = (Q,\Sigma,\delta,q_0,F)$, where

$Q$ is a finite set of states,
$\Sigma$ is a finite input alphabet,
$\delta$ is a transition function $\delta:Q \times \Sigma \to Q$,
$q_0$ is the start or initial state,
$F$ is a set of final or accepting states.

Note that by our definition, $\delta$ must be defined for every state and input symbol. We call this a deterministic finite automaton because the transition function $\delta$ is uniquely determined by the state and input symbol.

From our example above, we can list all the parts that we've specified in our formal definition:

$Q = \{q_0,q_1,q_2,q_3\}$,
$\Sigma = \{a,b,c\}$,
$F = \{q_0\}$,

and $\delta$ is given by \begin{align*} \delta(q_0,a) &= q_1 & \delta(q_0,b) &= q_1 & \delta(q_0,c) = q_0 \\ \delta(q_1,a) &= q_2 & \delta(q_1,b) &= q_0 & \delta(q_1,c) = q_1 \\ \delta(q_2,a) &= q_3 & \delta(q_2,b) &= q_2 & \delta(q_2,c) = q_2 \\ \delta(q_3,a) &= q_0 & \delta(q_3,b) &= q_3 & \delta(q_3,c) = q_0 \end{align*}

The transition function can also naturally be described as a table,

	$a$	$b$	$c$
$q_0$	$q_1$	$q_1$	$q_0$
$q_1$	$q_2$	$q_0$	$q_1$
$q_2$	$q_3$	$q_2$	$q_2$
$q_3$	$q_0$	$q_3$	$q_0$

Is there a better way to do this than just enumerating every transition? Yes, and this becomes more necessary as we construct more complicated finite automata. For example, if we generalize the above DFA to a family of DFAs over $n$ states for $n \gt 3$, then we get the following.

$Q = \{0, 1, \dots, n-1\}$,
$\Sigma = \{a,b,c\}$,
$F = \{0\}$,

and $\delta$ is described as follows: \begin{align*} \delta(q,a) &= q+1 \pmod n, \\ \delta(q,b) &= \begin{cases} 1 & \text{if $q = 0$,} \\ 0 & \text{if $q = 1$,} \\ q & \text{if $q \in \{2, 3, \dots, n-1\}$,} \end{cases} \\ \delta(q,c) &= \begin{cases} q & \text{if $q \in \{0, 1, \dots, n-2\}$,} \\ 0 & \text{if $q = n-1$.} \end{cases} \end{align*}

This describes DFAs that look like this:

Then choosing $n = 4$ gives us the machine we specified above. But in describing our machine this way, we've created a bunch of machines that behave similarly, but for any $n$ we choose. This kind of thing becomes useful once we start doing more complicated things.

Now, we want to also formally define the notion of the set of words that our machine will accept. Intuitively, we have a sense of what acceptance and rejection means. We read a word and trace a path along the automaton and if we end up in an accepting state, the word is accepted and otherwise, the word is rejected.

First of all, it makes sense that we should be able to define this formally by thinking about the conditions that we've placed on the transition function. Since on each state $q$ and on each symbol $a$, we know exactly what state we end up on, it makes sense that each word takes us to a specific state.

So, we can extend the definition of the transition function so that it is a function from a state and a word to a state, $\hat \delta : Q \times \Sigma^* \to Q$.

The function $\hat \delta : Q \times \Sigma^* \to Q$ is defined by

$\hat \delta(q,\varepsilon) = q$,
$\hat \delta(q,w) = \hat \delta(\delta(q,a),x)$, for $a \in \Sigma$ and $x \in \Sigma^*$.

It might be helpful to note that we can also approach this definition from the other direction. That is, instead of writing $w = ax$, we consider the factorization $w = xa$. Then the inductive part of our definition becomes $$\hat \delta(q,xa) = \delta(\hat \delta(q,x),a).$$

Technically speaking, $\delta$ and $\hat \delta$ are different functions, since one is a function on symbols and the other is a function on words. However, since $\hat \delta$ is a natural extension of $\delta$, it's very easy to forget about the hat and simply write $\delta$ for both functions. This is something I'm not too concerned about the meaning is almost always clear.

A neat observation about $\hat \delta$ is that if you remember your CS 151 or 161, then this is really just a foldl on $\delta$. In other words, $\hat \delta(q,w)$ would be something like (foldl δ q w) (this is okay because you can use Unicode in Racket).

Now, we can formally define the acceptance of a word by a DFA.

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA. We say $A$ accepts a word $w \in \Sigma^*$ when $\hat \delta(q_0,w) \in F$. Then the language of a DFA $A$ is the set of all words accepted by $A$, $$L(A) = \{w \in \Sigma^* \mid \hat \delta(q_0,w) \in F \}.$$ We also sometimes say that a word or language is recognized by $A$. It is important to note that when we say that a DFA accepts a language $L$, we are saying that the DFA accepts only those words in $L$ and rejects all words not in $L$.