CMSC 27200 — Lecture 1

What is this course about?

This course is about the design and analysis of algorithms.

An algorithm is nothing but a finite series of steps that can be followed to solve a particular problem. Today, we think of algorithms as something that a computer does, but the notion of an algorithm is quite old. The word itself is derived from the 9th century Persian mathematician al-Khwarizmi, but the idea goes back even further. One such algorithm we've already seen (because you should've taken discrete math in order to get into this class) is Euclid's algorithm for finding the greatest common divisor of two numbers, which is from about 300 BC.

Historically, algorithms typically came from proofs, since to show that some object existed, it was good enough to prove that one could construct it. Many of these constructions are algorithms. But while algorithms themselves are old, the idea of analyzing algorithms is relatively new. The oldest known analysis of an algorithm is due to Gabriel Lamé's analysis of Euclid's algorithm in the 1800s, about 2000 years later.

Even then, there was very little reason to analyze the efficiency of algorithms. After all, what's the point of seeing how an algorithm performs on an input of size 1000000000 if humans wouldn't be able to carry out that many operations? And so, if there was any analysis of algorithms, it would be the same way we think about what makes a good proof: how clever or elegant they were. Obviously, things changed once we started making machines that could carry out millions of operations at a time. The idea of studying the efficiency of algorithms began in the 1960s.

This course is structured around algorithm design techniques and approaches. For each of these techniques, we will investigate several problems that are solved by designing algorithms using the technique in question.

This is a theoretical course, and we'll be discussing algorithms and how to prove their correctness and efficiency. To succeed in this course, you should be comfortable with mathematical proof. Many of the results in this course will depend on ideas from discrete math, particularly mathematical induction and graph theory.

Analyzing Algorithms

Very broadly, the theoretical computer science is concerned with problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we're interested in how to solve problems via a well defined process that can be automated rather than focusing on the machines called computers, since, as we discussed earlier, algorithms don't need to involve computers at all.

So what is a problem? Generally, we view problems in very much the same way as we did in CS 151, as a function which takes some input and produces some output. Then our program or algorithm computes the function by correctly mapping the inputs to the right outputs. By now, you have some intuition on how to do this and you'll have developed some tricks to solve these problems and check that they're correct.

However, we want to go a bit further than that and make all of these notions a bit more formal. Instead of trial and error, we want to study the structure of these problems and come up with formal approaches to designing algorithms for them. Instead of acquiring evidence that our algorithms are probably correct via testing, we want to be able to prove definitively that the algorithms are correct. And we want to have a formal notion of what it means for an algorithm to be fast or efficient.

This development in your understanding of computer science and algorithms mirrors the early development of algorithm analysis in the postwar era. Early algorithm analysis was generally experimental. To determine the efficiency of an algorithm, computer scientists would run the algorithms for some sample inputs and then try to extrapolate from that. It wasn't until the early 1960s that computer scientists like Knuth, Hartmanis, and Edmonds started working on a more rigourous approach.

Up until now, if you've considered the efficiency of an algorithm, it's been by doing things like counting the number of iterations of a loop or counting the number of operations that you perform. These are generally what we will continue to do, but we'll make things a bit more formal.

Models of computation

One of the immediate problems with analyzing algorithms is figuring out what it is we're actually counting. To do that, we need to think about how our algorithms are being executed and what constitutes a "basic" operation. This is not necessarily obvious and you can fall into some unlikely traps if you're not careful with how you define this. For example, we often think of arithmetic operations as being "basic", but does it really make sense to think of addition and multiplication as having the same cost? What about looking things up in an array? What we need to do is formalize and fix a model of computation which tells us what our "computer" can do and what's considered a basic operation for such a computer.

The classical model of computation is the Turing machine, defined by Alan Turing in 1936 and is considered the prototypical computational device. The Turing machine is a machine that consists of a finite state machine and an infinitely long tape. The machine reads the tape one tape cell at a time, and based on the state of the machine and what it reads on the tape, the machine will go to a different state, write to the tape, and move along the tape. The "basic operations" for a Turing machine are very well defined: one transition of the machine can be thought of as one time unit, and one tape cell is one unit of space.

If you are intrigued by this, you should consider taking CMSC 28000: Introduction to Formal Languages.

The problem with this is that although everything is very well-defined, it isn't really how we think of modern algorithms. One of the surprising things about the Turing machine is that, in principle, the Turing machine is capable of doing anything that a modern computer does. However, we want to abstract some of this stuff. For example, adding two numbers is non-trivial on the Turing machine because we have to describe how to do so in terms of the basic operations. Intuitively, we feel like adding two numbers should be something that's "basic" that we don't need to describe.

The model of computation that's generally used in algorithms analysis more closely resembles electronic computers and is called the word random access machine (RAM) model. We assume that we have random access to an indexed memory made up of registers that each store a word, which is an $n$-bit integer. Basic operations include integer arithmetic and logic (comparisons), reading and writing to a register, pointer arithmetic, branching, and so on. Here, time is defined as the number of primitive operations we use and space is the number of memory cells that are used.

Even here, we have to be careful about things like how big a word is or what constitutes a "basic" operation. For instance, it's easy to forget that registers store an $n$-bit integer for some fixed $n$. Since we often think of integer arithmetic as one operation, what if we want to add an $n^2$-bit integer? Or, if we wanted to think about algorithms for integer multiplication, it'd be silly to make multiplication a unit cost operation! These kinds of things do not come up often, but when they do, they can have a big effect on your analysis. For most purposes, we cheat and simply say that a word is as big as we need (i.e. if we have an input of size $n$, we want our word to be $\geq \log_2 n$).

Worst-case analysis

This gives us a way to discuss the time or space that an algorithm uses, but leads us to another problem: the number of operations that an algorithm uses depends on the input. For example, the running time for a sorting algorithm when given a list that's already been sorted can be quite different from an arbitrary list!

To deal with this, we typically analyze algorithms based on the worst-case running time. In other words, the complexity of an algorithm is the running time for the input that forces the algorithm to use the most time or space. This gives us a complexity guarantee, since the algorithm will never use more than its worst-case time or space.

But what do we mean by "worst case"? We clearly don't mean an arbitrary worst case, since in most cases, we can always find a larger input that will take longer. So what we really mean is the worst case for a particular input size $n$. This then lets us talk about the time or space complexity as a function of the input size $n$.

The worst-case time complexity $T(n)$ of an algorithm $A$ is the maximum over all inputs $I$ of size $n$ of the total number of operations used by $A$ when run on $I$. The worst-case space complexity $S(n)$ of $A$ is the maximum over all inputs $I$ of size $n$ of the total number of registers used by $A$ when run on $I$.

A common objection to this is that it's a pretty harsh metric to be judging algorithms by. For instance, what if the worst case is rare? And this is a perfectly fine objection! If you happen to know some extra information about your particular inputs, then you can and should use that in your design and analysis of your algorithm.

There are other types of analysis, like average-case analysis or probabilistic analysis. And there will be situations where such analyses are useful, but generally speaking, worst-case analysis gives us the most general understanding of an algorithm and the problem that the algorithm is trying to solve.

Efficiency and Polynomial Time

So now, we can analyze an algorithm and determine its worst-case time and space complexity asymptotically. The final basic question that we have to account for is what is considered "good" or "reasonable" when it comes to time and space complexity. To do this, we have to think about what our possible algorithms can look like.

Most of the time, there is an "obvious" algorithm for solving a problem: try every solution. Depending on the exact problem, such an approach can result in checking something like $2^n$ or $n!$ different solutions. So if we double the size of our input, then we double the number of solutions we're searching through.

Of course, we hope to find a more clever way to solve a problem than just trying everything. What this means is that if we're clever enough, our algorithm shouldn't just double the amount of time it takes when we double the size of our problem instance.

More formally, we would like our algorithms to scale in such a way that when the input size doubles, the algorithm should only slow down by some constant factor. This leads to the notion of time complexity that is expressed as a polynomial.

We say an algorithm runs in polynomial time if there exists $k$ such that its worst-case time complexity $T(n)$ is polynomial.

To see why this works, we suppose that our running time $T(n)$ is polynomial. In other words, $T(n)$ is about proportional to $cn^k$. If we double the input size, we have $c(2n)^k = c2^kn^k = 2^k \cdot cn^k$. Since $k$ is constant, doubling the input size increases the running time by a constant factor.

So we say that an algorithm is efficient if it runs in polynomial time.