CMSC 28000 — Lecture 1

The Theory of Computation

The title of this course refers to formal languages and you will definitely learn a lot about formal languages. However, for our purposes, formal language theory is really more of the vehicle through which we talk about computation. Many similar courses at other schools will call this course something like Theory of Computation. But what is the theory of computation?

As we know, computer science is mathematics. Generally speaking, one can view math as the study of structures. For example,

Number theory is about the structure of the integers.
Algebra is about algebraic structures, which we get when considering sets and operations on those sets.
Combinatorics is about enumeration of discrete structures.
Calculus and analysis are about continuous structures.

So what are the structures that computer science is concerned about? Let's think back to what we do in computer science:

We abstract and model problems as mathematical problems.
We solve mathematical problems mechanically by exploiting the structure of those problems.
We formalize those solutions into code.

One obvious structure is the structure of the algorithms that solve problems. This is essentially what we study in CMSC 27200—algorithms are structures with certain properties that we can prove things about. But we can also study the structure of the problems.

At first, it might be a bit strange to think of a problem as a mathematical object, but if we read what we said above carefully, we realize that we have always implicitly assumed that problems have some kind of structure. After all, that's why algorithms even work.

So very broadly, the theory of computation is about problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we're interested in how to solve problems via a well defined process that can be automated rather than focusing on the machines called computers. The key idea in the theory of computation is that all of the objects that we study, problems and the processes by which we can solve these problems, are themselves mathematical objects.

So what is a problem? In a course like CS 151 or Theory of Algorithms, we think of problems as questions which ultimately turn out to be functions, mapping inputs to outputs. However, even here, it's important to separate the notion of a function in mathematics from a function in computer science. Because we overload the term in CS, what we typically describe as functions are really algorithms: the steps to computing the function.

One assumption that we tend to make when viewing things from this perspective is that there's necessarily an algorithm that computes every function. This assumption gets to the heart of what we're going to be exploring in this course.

So then solving problems amounts to describing a program or algorithm that correctly computes the function. However, functions can have all sorts of inputs and outputs, which complicates things if we want to study them. So in this course, and for a lot of computability and complexity theory, we focus on decision problems, which are those questions (or functions) that can be answered by (have an output of) YES or NO.

For instance, we can ask about a natural number and ask for its prime factors. In this case, we'd have a function $\operatorname{Factor}: \mathbb N \to \operatorname{List}(\mathbb N)$ that maps a number to its list of prime factors. However, we might ask instead whether a given natural number $n$, is prime. This question is a yes or no question, say $\operatorname{Prime}:\mathbb N \to \{\mathrm{Yes},\mathrm{No}\}$.

We can get fancier, of course. What if we wanted to receive multiple inputs? Or more exotic inputs? For example, we could ask whether, given a graph $G$, and vertices in that graph $u$ and $v$, whether $G$ contains a path between $u$ and $v$. Again, this is a yes or no question. So in addition to restricting our output, we take another simplification and restrict our set of inputs.

To justify this step, let's think again to the programs we write. We can write functions that take any number of things as inputs: datasets, graphs, dogs. However, while mathematicians can think of these things literally, computer scientists don't—we always have some encoding of our data into strings. So we assume that all problems have string inputs.

So we've taken arbitrary problems as functions \[P : \mathrm{IN} \to \mathrm{OUT}\] and restricted it to string functions with binary outputs \[P : \Sigma^* \to \{\mathrm{Yes}, \mathrm{No}\}.\] This is great because it gives us a very clear set of functions to work with. But we're not quite done just yet.

One observation we can make is that such a function essentially defines a set of strings. If $P$ is our function for our decision problem, then we can define a set \[\{x \in \Sigma^* \mid P(x) = \mathrm{Yes}\},\]a the set of strings for which $P$ answers Yes. Then the complement of this set is all of the other strings.

This is a very important sleight of hand—instead of thinking about functions, we can now reframe our questions to be about sets of strings. In essence, we've worked our way from thinking about problems to thinking about sets of strings. What's interesting about this is that this allows us to divorce the meaning of the problem (a function in some interesting domain) and instead focus on the structure of the strings in this set.

And so we have established the basic connection between the study of problems and how to compute them and the study of the mathematical properties of strings and sets of strings. Sets of strings are called languages in formal language theory. Then solving a problem amounts to solving the langauge membership problem for strings. It seems very strange to be be able to reduce computation to thinking about strings, but this is the origin of some of the most fundamental results in our field.

Strings

Languages are sets of strings, so to talk about languages, we need to talk about strings first. We typically first run into strings when we first learn to program. They also happen to be an interesting mathematical object.

An alphabet is a finite set of symbols. We typically denote alphabets by $\Sigma$.

When we think of an alphabet we usually think of something like Latin alphabet $\{a,b,c,d,\dots,z\}$. However, we can take any non-empty finite set to be an alphabet. An alphabet that we're all probably very familiar with is the binary alphabet $\{0,1\}$. From this, we can make binary strings. We can also consider more exotic alphabets if we wish. For instance, we might consider the alphabet of DNA bases $\{A,C,G,T\}$. Or we can consider a very large, but still finite, alphabet like Unicode.

A string or word $w = a_1 a_2 \cdots a_n$ over an alphabet $\Sigma$ is a finite sequence of symbols where each $a_i \in \Sigma$. The set of all strings over $\Sigma$ is denoted $\Sigma^*$.

Depending on the particular subfield of research, strings are sometimes called words. And using word to mean string makes some sort of sense since we're talking about "langauges". Anyhow, I can guarantee that even if I promise to try to use one consistently, I will slip and use the other, so I will make no such promises.

This definition for a string is a typical definition and it's probably one that you're used to. However, a slightly more insightful and formal way to define strings is by defining

A string or word $w$ over an alphabet $\Sigma$ is either

the empty string $\varepsilon$, or
$ax$, where $a \in \Sigma$ and $x$ is a string.

This gives us a natural way to define other properties of strings and perform induction on them.

The length of a word $w$ is denoted by $|w|$ and is the number of symbols in $w$. The length of a string can be defined inductively by

if $w = \varepsilon$, then $|w| = 0$, and
if $w = ax$ for $a \in \Sigma$ and $x \in \Sigma^*$, then $|w| = 1 + |x|$.

The most basic operation that we can perform on strings is concatenating them.

The concatenation of two strings $u = a_1 a_2 \cdots a_m$ and $v = b_1 b_2 \cdots b_n$, denoted $u \cdot v$ or simply $uv$, is $uv = a_1 \cdots a_m b_1 \cdots b_n$.

Observe that concatenation is associative, since $(uv)w = u(vw)$. However, it is not commutative (i.e. $uv$ is not necessarily equal to $vu$). Usually, we learn about concatenation when we first learn to program. Depending on how you learned to program, you may think of it as sticking two strings together. Or, if you learned to program functionally, then you will have seen the following alternative definition before.

The concatenation of two strings $u$ and $v$ is defined inductively:

if $u = \varepsilon$, then $uv = \varepsilon v = v$, or
if $u = ax$ for $a \in \Sigma$ and $x \in \Sigma^*$, then $u \cdot v = a(x \cdot v)$.

It becomes more clear in this view of strings that alphabet symbols are not technically considered strings. However, we are often lazy and don't bother making this distinction very often since for the most part, treating symbols as strings of length 1 doesn't get us into trouble very often.

Also notice that this makes it clear that we can perform concatenation on the empty string. If we have a string $w$, then $w \epsilon = \epsilon w = w$. This leads us to a very interesting way to consider strings.

An interesting consequence of these definitions is that they already define a mathematical structure.

A monoid is a structure $\langle M,\cdot,1 \rangle$, where $M$ is a set, $\cdot$ is an associative binary operation, and $1$ is an identity for $\cdot$.

From this, we can see that the set of strings, with the binary operation concatenation and identity element $\varepsilon$, is a monoid, and this is where $\Sigma^*$ gets its notation from—it is the free monoid on the set $\Sigma$. Of course, for us, this is will be little more than a curiosity, but it does point to a few things. First of all, strings aren't just arbitrary objects; they have mathematical meaning and structure. Secondly, monoids are algebraic objects (just add inverses and you've got groups!), and this points to the possibility that studying formal language theory using algebraic tools may not be such a bad idea.

Let's go back to concatenation and define a few more notions.

The $k$th power of a string $w$ is $$w^k = \overbrace{ww \cdots w}^{k \text{ times}}$$ and we define $w^0 = \varepsilon$. Formally, we define this inductively by

$w^0 = \varepsilon$, and
$w^k = w^{k-1} w$.

(Notice that this definition is inductive on the natural numbers rather than strings)

A related notion that won't come up very much in this course is the notion of primitivity. We say a word is primitive if it is non-empty and can't be written as the power of some other word. This is sort of like the notion of a "prime unit" for strings.

If $w = uxv$ is a string, we say

$u$ is a prefix of $w$,
$v$ is a suffix of $w$, and
$x$ is a subword or infix or substring or factor of $w$.

It's important to note that $u,v,w,x$ are strings, which can include $\varepsilon$. So $\varepsilon$ is a prefix of every string and that $w$ itself is a prefix of $w$.

Let $w = cat$.

The prefixes of $w$ are $\{\varepsilon, c, ca, cat\}$.
The suffixes of $w$ are $\{cat, at, t, \varepsilon\}$.
The prefixes of $w$ are $\{\varepsilon, c, ca, cat, at, t, a\}$.

Much like the usage of string or word, there are a plethora of terms that can be used interchangeably to describe what you're probably most familiar with as a substring. The term subword is the same idea, if you were to use word instead of string. Similarly, the term infix comes naturally from the analogy with prefix and suffix. The one which may be less obvious is factor; this comes from viewing the string as an algebraic object: if $6 = 1 \times 2 \times 3$, then $2$ is a factor of $6$. Similarly, if $w = x \cdot y \cdot z$, then $y$ is a factor of $w$.