CMSC 27200 — Lecture 2

Analyzing Algorithms

As I mentioned in the first lecture, the idea that analyzing algorithms for efficiency was not something that was done until relatively recently. Here, we'll run through a few of the basic ideas that are central to analysis of algorithms.

Early algorithms analysis was generally experimental. To determine the efficiency of an algorithm, computer scientists would run the algorithms for some sample inputs and then try to extrapolate from that. It wasn't until the early 1960s that computer scientists like Knuth, Hartmanis, and Edmonds started working on a more rigourous approach.

Up until now, if you've considered the efficiency of an algorithm, it's been by doing things like counting the number of iterations of a loop or counting the number of operations that you perform. These are generally what we will continue to do, but we'll make things a bit more formal.

Models of computation

One of the immediate problems with analyzing algorithms is figuring out what it is we're actually counting. To do that, we need to think about how our algorithms are being executed and what constitutes a "basic" operation. This is not necessarily obvious and you can fall into some unlikely traps if you're not careful with how you define this. For example, we often think of arithmetic operations as being "basic", but does it really make sense to think of addition and multiplication as having the same cost? What about looking things up in an array? What we need to do is formalize and fix a model of computation which tells us what our "computer" can do and what's considered a basic operation for such a computer.

Example 2.1. The classical model of computation is the Turing machine, defined by Alan Turing in 1936 and is considered the prototypical computational device. The Turing machine is a machine that consists of a finite state machine and an infinitely long tape. The machine reads the tape one tape cell at a time, and based on the state of the machine and what it reads on the tape, the machine will go to a different state, write to the tape, and move along the tape. The "basic operations" for a Turing machine are very well defined: one transition of the machine can be thought of as one time unit, and one tape cell is one unit of space.

The problem with this is that although everything is very well-defined, it isn't really how we think of modern algorithms. One of the surprising things about the Turing machine is that, in principle, the Turing machine is capable of doing anything that a modern computer does. However, we want to abstract some of this stuff. For example, adding two numbers is non-trivial on the Turing machine because we have to describe how to do so in terms of the basic operations. Intuitively, we feel like adding two numbers should be something that's "basic" that we don't need to describe.

Definition 2.2. The model of computation that's generally used in algorithms analysis more closely resembles electronic computers and is called the random access machine (RAM) model. We assume that we have random access to an indexed memory made up of registers that each store an $n$-bit integer. Basic operations include integer arithmetic and logic (comparisons), reading and writing to a register, pointer arithmetic, branching, and so on. Here, time is defined as the number of primitive operations we use and space is the number of memory cells that are used.

Even here, we have to be careful about what constitutes a "basic" operation. For instance, it's easy to forget that registers store an $n$-bit integer, and since we often think of integer arithmetic as one operation, what if we want to add an $n^2$-bit integer? Or, if we wanted to think about algorithms for integer multiplication, it'd be silly to make multiplication a unit cost operation! These kinds of things do not come up often, but when they do, they can have a big effect on your analysis.

Worst-case analysis

This gives us a way to discuss the time or space that an algorithm uses, but leads us to another problem: the number of operations that an algorithm uses depends on the input. For example, the running time for a sorting algorithm when given a list that's already been sorted can be quite different from an arbitrary list!

To deal with this, we typically analyze algorithms based on the worst-case running time. In other words, the complexity of an algorithm is the running time for the input that forces the algorithm to use the most time or space. This gives us a complexity guarantee, since the algorithm will never use more than its worst-case time or space.

But what do we mean by "worst case"? We clearly don't mean an arbitrary worst case, since in most cases, we can always find a larger input that will take longer. So what we really mean is the worst case for a particular input size $n$. This then lets us talk about the time or space complexity as a function of the input size $n$.

Definition 2.3. The worst-case time complexity $T(n)$ of an algorithm $A$ is the maximum over all inputs $I$ of size $n$ of the total number of operations used by $A$ when run on $I$. The worst-case space complexity $S(n)$ of $A$ is the maximum over all inputs $I$ of size $n$ of the total number of registers used by $A$ when run on $I$.

A common objection to this is that it's a pretty harsh metric to be judging algorithms by. For instance, what if the worst case is rare? And this is a perfectly fine objection! If you happen to know some extra information about your particular inputs, then you can and should use that in your design and analysis of your algorithm.

There are other types of analysis, like average-case analysis or probabilistic analysis. And there will be situations where such analyses are useful, but generally speaking, worst-case analysis gives us the most general understanding of an algorithm and the problem that the algorithm is trying to solve.

Asymptotics

So now we know that to think about the complexity of an algorithm, we simply count the number of steps it takes or how much memory it uses in the worst case input, and this gives us a function in the size of the input. So if we want to compare the efficiency or complexity of algorithms, what we really end up doing is comparing the functions.

Rather than figuring out whether a function is always less than some other function, we are more interested in how the function grows as input grows larger. To do this, we have to turn to calculus/analysis and perform asymptotic analysis. By analyzing the time and space complexity of an algorithm asymptotically, we abstract away many of the small, unimportant details of implementation and get a better idea of whether the broader strategy of the algorithm is efficient.

Definition 2.4. Let $f:\mathbb N \to \mathbb R$ be a function. We say that $f(n)$ is $O(g(n))$ for some function $g:\mathbb N \to \mathbb R$ if there exist constants $c \gt 0$ and $n_0 \geq 0$ such that for all $n \geq n_0$, $f(n) \leq c \cdot g(n)$.

We call this "big O" notation. Roughly speaking, big O gives us an upper bound on the growth of a function. There are also corresponding notions for the lower bound ("big omega") and asymptotically tight bounds ("big theta").

Definition 2.5. Let $f:\mathbb N \to \mathbb R$ be a function. We say that $f(n)$ is $\Omega(g(n))$ for some function $g:\mathbb N \to \mathbb R$ if there exist constants $c \gt 0$ and $n_0 \geq 0$ such that for all $n \geq n_0$, $f(n) \geq c \cdot g(n)$.

Definition 2.6. Let $f:\mathbb N \to \mathbb R$ be a function. We say that $f(n)$ is $\Theta(g(n))$ for some function $g:\mathbb N \to \mathbb R$ if there exist constants $c_1,c_2 \gt 0$ and $n_0 \geq 0$ such that for all $n \geq n_0$, $c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n)$.

Without any additional properties, we would have to go through a proof based on the definition in order to do any comparisons. However, there are a number of properties and identities that are quite easy to prove that will make our lives considerably easier.

Proposition 2.7. Let $f:\mathbb N \to \mathbb R$ be a function.

$f$ is $O(f)$.
If $f$ is $O(g)$, then $cf$ is $O(g)$ for all $c \gt 0$.
If $f_1$ is $O(g_1)$ and $f_2$ is $O(g_2)$, then $f_1 f_2$ is $O(g_1 g_2)$.
If $f_1$ is $O(g_1)$ and $f_2$ is $O(g_2)$, then $f_1 + f_2$ is $O(\max\{g_1, g_2\})$.
If $f$ is $O(g)$ and $g$ is $O(h)$, then $f$ is $O(h)$.

Proving these properties makes for a nice exercise. Now, since we're comparing functions, you might expect that we can use some calculus tricks here. First, let's remind ourselves of some definitions.

Definition 2.8. Let $\{a_n\}_{n \geq 0}$ be an infinite sequence of real numbers. We write, for some constant $c \gt 0$, \[\lim_{n \to \infty} a_n = c \quad \text{if} \quad \forall \varepsilon \gt 0, \exists n_0 \in \mathbb N, \forall n \geq n_0, |a_n - c| \leq \varepsilon.\]

Proposition 2.9. If $\lim_{n \to \infty} \frac{f(n)}{g(n)} = c$ for some constant $c \gt 0$, then $f(n)$ is $\Theta(g(n))$.

Proof. By definition of limit, for any $\varepsilon \gt 0$, there exists $n_0$ such that \[c - \varepsilon \leq \frac{f(n)}{g(n)} \leq c+\varepsilon\] for all $n \geq n_0$. Then choose $\varepsilon = \frac 1 2 c$. This gives us \[\frac 1 2 c \cdot g(n) \leq f(n) \leq \frac 3 2 c \cdot g(n)\] for all $n \geq n_0$. Then we have $f(n) = \Theta(g(n))$ with $c_1 = \frac 1 2 c$ and $c_2 = \frac 3 2 c$. $$\tag*{$\Box$}$$

We can arrive at similar results for $O$ and $\Omega$.

Proposition 2.10. If $\lim_{n \to \infty} \frac{f(n)}{g(n)} = 0$, then $f(n)$ is $O(g(n))$ but not $\Omega(g(n))$. If $\lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty$, then $f(n)$ is $\Omega(g(n))$ but not $O(g(n))$.

Here are some useful things to know. You can try to prove them using the results from above.

Proposition 2.11. Let $f: \mathbb N \to \mathbb R$ be a function.

If $f(n) = a_0 + a_1 n + \cdots + a_k n^k$ with $a_k \gt 0$. Then $f(n) = \Theta(n^k)$.
If $f(n) = \log_a n$. Then $f(n) = \Theta(\log_b n)$ for all $a,b \gt 1$.
If $f(n) = \log_a n$. Then $f(n) = O(n^k)$ for all $a \gt 1$ and $k \gt 0$.
If $f(n) = n^k$. Then $f(n) = O(r^n)$ for all $r \gt 1$ and $k \gt 0$.
If $f(n) = n!$. Then $f(n) = 2^{\Theta(n \log n)}$.

Efficiency and Polynomial Time

So now, we can analyze an algorithm and determine its worst-case time and space complexity asymptotically. The final basic question that we have to account for is what is considered "good" or "reasonable" when it comes to time and space complexity. To do this, we have to think about what our possible algorithms can look like.

Most of the time, there is an "obvious" algorithm for solving a problem: try every solution. Depending on the exact problem, such an approach can result in checking something like $2^n$ or $n!$ different solutions. So if we double the size of our input, then we double the number of solutions we're searching through.

Of course, we hope to find a more clever way to solve a problem than just trying everything. What this means is that if we're clever enough, our algorithm shouldn't just double the amount of time it takes when we double the size of our problem instance.

More formally, we would like our algorithms to scale in such a way that when the input size doubles, the algorithm should only slow down by some constant factor. This leads to the notion of time complexity that is expressed as a polynomial.

Definition 2.12. We say an algorithm runs in polynomial time if there exists $k$ such that its worst-case time complexity $T(n)$ is $O(n^k)$.

To see why this works, we suppose that our running time $T(n)$ is $O(n^k)$. In other words, $T(n)$ is about proportional to $cn^k$. If we double the input size, we have $c(2n)^k = c2^kn^k = 2^k \cdot cn^k$. Since $k$ is constant, doubling the input size increases the running time by a constant factor.

So we say that an algorithm is efficient if it runs in polynomial time.

There are a few basic objections to tying polynomial time to efficiency or practical, but by now we've come up with good arguments. The most common one is the example of the algorithm that runs in $O(n^{1000000})$ time versus the algorithm that runs in $O(1.000001^n)$ time.

Of course, it is a cop-out to say that in most practical cases, this won't come up, but that's really the truth. In some strange coincidence, many of the algorithms we come up with tend to have small and reasonable constants, exponents, or bases, and of those that don't, it's often not too long before further research leads us to better algorithms.

To step a bit outside of the realm of algorithms, you may have heard about the Church-Turing thesis, which says that all reasonable models of computation (like the Turing machine or the RAM) are equivalent in power. The extended Church-Turing thesis says that not only are all of these models equivalent, but that they can all simulate each other with only a polynomial increase in running time. This means that the idea of tying efficiency and polynomial time together is quite robust and doesn't really depend on a particular model of computation.

Here's a fun question to think about: what about algorithms that use polynomial space?

Analyzing Gale-Shapley

Let's walk through an example of how to do this kind of complexity analysis. Luckily, we've just looked at an algorithm. Returning to Gale-Shapley, it's time to ask the following question: how efficient is it? We'll work towards showing the following.

Theorem 2.13. Gale-Shapley has worst-case time complexity $O(n^2)$.

If we look at the algorithm and recall Proposition 1.7, we can conclude that the while loop will get executed $O(n^2)$ times. What we would need to show to prove Theorem 2.13 is that the sequence of operations within the while loop amount to $O(1)$ cost.

First, we assume that each position and intern is indexed by $1, \dots, n$. We can imagine that we have an array positions[] of positions and an array interns[] of interns, with positions[i] holding an array of preferences. So position[i][j] = k would mean that the $j$th most preferred intern for position $i$ is intern $k$.

First, let's think about what the algorithm does on each iteration.

Determine if there is an unmatched position $u$.
Find the first intern $v$ who hasn't been made an offer by $u$.
Determine if $v$ is unmatched.
Determine if $v$ prefers $u$ or their current match.
Update the matching $M$, if necessary.

To show that each of these amounts to a constant number of operations, one of the first issues that we have to deal with is think about how things are represented in our algorithm. For Gale-Shapley, we can make use of arrays and lists to do everything we need.

First, to determine whether there are unmatched positions, we keep all unmatched positions in a list. Since it doesn't matter what order they come in (which we showed in Proposition 1.13), we can just use any old list, or if we'd like to get fancier, we can use a queue. Either way, we just need to know whether there are unmatched positions and we want to be able to insert (if we're replacing a pair in the matching and some other position becomes free) or delete (if our current free position gets matched) in $O(1)$ time, so lists are perfect for that.

Secondly, to find the next intern that a position gives an offer to, we simply need to keep track of which intern was made an offer last by each position. Since we're given the preference lists, we just need to keep track of all of this, possibly in an array next_intern[]. Then this step can be done in $O(1)$ time since array lookups and updates are $O(1)$.

To check if $v$ is unmatched, we need a way to keep track of our matching. We can do this by using an array matching[]. We take advantage of the fact that we never really need to query the matching based on the position, so we have matching[i] = j for intern $i$ and position $j$ if $(j,i) \in M$ and matching[i] = 0 if intern $i$ hasn't been matched. This takes care of updating the matching as well, since updates will be $O(1)$.

The most difficult part is how to determine if $v$ prefers $u$ when being made an offer. Suppose $(u',v) \in M$. To determine if $v$ prefers $u$ over $u'$, we would have to find $u$ and $u'$ in the preference list for $v$, which could take $O(n)$ time by searching through the array holding $v$'s preferences.

What we can do to avoid this is to construct a lookup for each intern $v$ that will tell us which rank each position $u$ occupies. This is basically the inverse of the preference list, where we query based on the position to get the rank. So we construct an array rank[] where rank[i] is an array that contains the inverse of intern $i$'s preference list. That is, rank[i][j] = k means that position $j$ is ranked $k$th by intern $i$.

Constructing this lookup table can be done at the very beginning of the algorithm and takes $\Theta(n^2)$ time. Once we do that, we only need $O(1)$ time to look up the ranking for each position and we can avoid doing a search through preference lists.

In total, our algorithm does the following:

Construct the ranking lookup in $\Theta(n^2)$ time.
Perform $O(n^2)$ offers. Each offer requires $O(1)$ time.

This gives us a total of $O(n^2)$ time complexity.