CMSC 27200 — Lecture 1

Welcome to CMSC 27200! Basic course information can be found on the home page of the course website.

What is this course about?

This course is about the design and analysis of algorithms.

An algorithm is nothing but a finite series of steps that can be followed to solve a particular problem. Today, we think of algorithms as something that a computer does, but the notion of an algorithm is quite old. The word itself is derived from the 9th century Persian mathematician al-Khwarizmi, but the idea goes back even further. One such algorithm we've already seen (because you should've taken discrete math in order to get into this class) is Euclid's algorithm for finding the greatest common divisor of two numbers, which is from about 300 BC.

Historically, algorithms typically came from proofs, since to show that some object existed, it was good enough to prove that one could construct it. Many of these constructions are algorithms. But while algorithms themselves are old, the idea of analyzing algorithms is relatively new. The oldest known analysis of an algorithm is due to Gabriel Lamé's analysis of Euclid's algorithm in the 1800s, about 2000 years later.

Even then, there was very little reason to analyze the efficiency of algorithms. After all, what's the point of seeing how an algorithm performs on an input of size 1000000000 if humans wouldn't be able to carry out that many operations? And so, if there was any analysis of algorithms, it would be the same way we think about what makes a good proof: how clever or elegant they were. Obviously, things changed once we started making machines that could carry out millions of operations at a time. The idea of studying the efficiency of algorithms began in the 1960s.

Why should you take this course?

Back when I was doing my undergrad in the late 2000s, we had gotten used to the idea that computers and the internet were going to form a larger and larger part of our lives headed into the next decade. At the time, BlackBerry was the king of the internet phone, Apple just launched their crazy iPod phone, Google wasn't evil, and Facebook had just let non-college students sign up for accounts.

In the years since, computers have grown even more omnipresent in our society and are analyzing more data and making more decisions than ever. With the growing complexity of such systems, it is becoming more important than ever to understand these systems and processes. And to do this, we need to understand the basics of how to rigourously analyze and design algorithms.

What will we be covering in this course?

This course is structured around algorithm design techniques and approaches. For each of these techniques, we will investigate several problems that are solved by designing algorithms using the technique in question.

This is a theoretical course, and we'll be discussing algorithms and how to prove their correctness and efficiency. To succeed in this course, you should be comfortable with mathematical proof. Many of the results in this course will depend on ideas from discrete math, particularly mathematical induction and graph theory.

Stable Matching

Let's jump right in and see what kinds of things we'll be doing for the rest of the quarter.

A common problem in many domains is matching. There are lots of such scenarios:

If we have $n$ of one group and $n$ of the other group, then we just have to find a way to find $n$ distinct pairs between the groups. However, there is an additional complication: people (and Pokémon) have preferences.

The scenario we want to avoid is the following: Alice gets matched with G◯◯gle but would rather go to Faceb◯◯k for the summer while Bob got matched with Faceb◯◯k but would be much happier at G◯◯gle. Now, if both G◯◯gle and Faceb◯◯k both had the same preferences as Alice and Bob, then arranging this switch is not so bad. But what happens if their preferences are in conflict? Furthermore, this scenario only deals with two pairs, but one can easily see how such a situation could unravel and cause a chain reaction of adjustments if it involved conflicting preferences among more people.

Define the problem

We can begin by abstracting the problem. These scenarios all have the same sorts of constraints:

For the sake of discussion, let's fix one of these scenarios. We'll let $U$ be the set of available positions and $V$ is the set of interns vying for those positions.

Definition 1.1. A matching $M$ is a set of pairs $(u,v)$ with $u \in U$ and $v \in V$ such that

We say a matching is perfect if $|M| = |U| = |V| = n$.

Some of you may remember from discrete math that this sounds oddly similar to the idea of a complete matching. In fact, perfect matchings are complete matchings, with the additional property that every vertex in both sets of the bipartition are matched (since they're both the same size).

The goal is to find a matching that will make everyone "happy". Of couse, happy is not mathematically precise. Let's think about what makes an "unhappy" match.

Definition 1.2. A pair $(u,v)$ is unstable with respect to a matching $M$ if both

  1. position $u$ prefers intern $v$ to its assigned intern $v'$; when $(u,v') \in M$,
  2. intern $v$ prefers position $u$ to its assigned position $u'$; when $(u',v) \in M$.

In other words, an unstable pair is a pair that's not in the matching which would abandon their assigned matches for each other. Then it makes sense to try to come up with a matching or assignment that doesn't exhibit any unstable pairs.

Definition 1.3. A matching $M$ is stable if it there are no unstable pairs with respect to $M$.

That is, in a stable matching, there are no pairs $(u,v)$ that would be willing to abandon their assigned match. In that sense, the matching is stable.

Definition 1.4. The stable matching problem takes as input a set $U$ of size $n$ and a set $V$ of size $n$. Each $u \in U$ ranks elements of $V$ and each $v \in V$ ranks elements of $U$. The output is a stable matching $M$.

The question we're faced with now is whether stable matchings even exist. Consider the following problem.

Example 1.5. Suppose instead of internships, we're in an introductory programming class and we want to pair up all the students for a pair programming exercise. Again, every student has a ranking of everyone else in the class.

1 2 3
A B C D
B C A D
C A B D
D A B C

In this example, we can never come up with a matching that is stable (try it yourself!).

This scenario differs slightly from the one we started out with. First of all, matchings are defined for graphs in general, and not just bipartite graphs. However, we can see that for general graphs, a stable matching does not need to exist. It turns out the fact that we can define our matching on two well-defined groups is important!

The Gale-Shapley algorithm

The algorithm for finding stable matchings in bipartite graphs is due to Gale and Shapley in 1962. We can describe the algorithm as follows.

    \begin{algorithm}
    \caption{Gale-Shapley}
    \begin{algorithmic}
    \PROCEDURE{stable-match}{preference lists for all $u \in U$ and $v \in V$}
        \STATE Initialize matching $M = \emptyset$
        \WHILE{there is an open position $u \in U$ that has not made an offer to every intern} 
            \STATE $v = $ the top intern on $u$'s list, who hasn't yet been made an offer by $u$
            \IF{$v$ is unmatched}
            \STATE add $(u,v)$ to $M$
            \ELIF{$v$ prefers $u$ to current position $u'$}
            \STATE replace $(u',v)$ with $(u,v)$ in $M$
            \ELSE
            \STATE $v$ rejects $u$
            \ENDIF
        \ENDWHILE
    \RETURN{$M$}
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

This, like most of the algorithms we'll see in this class, is described in pseudocode. You'll be expected to present algorithms in pseudocode as well. As you can see, the standards are quite loose as to what constitutes "code", so we'll have more specific guidance on what we're looking for later.

For now, let's to try to understand what it is this algorithm does. All of the action happens inside the while loop, which is executed as long as some position $u$ is unfilled (unmatched with an intern). So let's consider an unfilled position $u$.

For $u$, we go down the preference list for $u$ and choose the first (i.e. the highest ranked) intern $v$ that $u$ has not offered a job to. There are three possibilities.

  1. $v$ is unmatched. In this case, no one has offered $v$ a position, so $v$, like many interns in their situation, is glad to tentatively accept the offer. It's important to note that throughout this algorithm, all offers are tentative and can be changed.
  2. $v$ is matched to a position $u'$, but they prefer $u$. In this case, $(u',v)$ is already in $M$, but since $v$ would rather take $u$, they are allowed to switch. So we replace $(u',v)$ with $(u,v)$ in $M$.
  3. $v$ is matched but likes $u'$ more than $u$. In this case, since $v$ would be happier with their current match, they don't switch.

Analyzing the algorithm

There are a number of questions that this algorithm raises. First, how do we know this algorithm will terminate? It seems like it might be possible for interns to keep swapping indefinitely. Secondly, how do we know this will give us a correct stable matching? Again, since everyone's swapping all the time, how do we know we haven't missed someone?

We begin with the following observations about the status of each position $u$ and intern $v$:

First, we'll show that the algorithm is guaranteed to terminate.

Proposition 1.7. The Gale-Shapley algorithm terminates after at most $n^2$ iterations of the while loop.

Proof. Consider the set $P(t)$, \[P(t) = \{(u,v) \mid \text{$u$ made an offer to $v$ by iteration $t$}\}.\] On each iteration, an open position $u$ gives out an offer to some intern $v$ which was not made before, in order of preference. This means that on each iteration, some position $u$ will proceed down its preference list until it runs out of interns to make offers to. This also means that on each iteration, there is a new $(u,v)$, so we have that $|P(t+1)| \gt |P(t)|$. But there are $|U||V| = n^2$ such pairs, so the algorithm must terminate after all such offers have been made. $$\tag*{$\Box$}$$

Now, we'll work on showing that the algorithm gives us what we want: a stable perfect matching.

Lemma 1.8. The Gale-Shapley algorithm produces a matching.

Proof. First, we note that a position makes an offer only if it isn't matched, which means that it can be matched to at most one intern. Similarly, an intern only ever accepts an offer for one position, and they only ever trade up. $$\tag*{$\Box$}$$

Proposition 1.9. The Gale-Shapley algorithm produces a perfect matching.

Proof. We have to argue that every position $u \in U$ is matched to an intern and that every intern $v \in V$ is matched to a position.

First, we will show that every position $u \in U$ gets matched. Suppose, for contradiction, that the Gale-Shapley algorithm produces a matching $M$ with some unmatched position $u \in U$. Since the algorithm terminated, this means we exited the While loop and therefore, $u$ had made an offer to every intern $v \in V$ and remained unmatched.

In order for $u$ to remain unmatched, it must be the case that every $v \in V$ has already been matched. Since, by Lemma 1.8, $M$ is a matching, each $v$ must be matched to a unique $u' \in U$. Since $|V| = n$, there must be $n$ positions matched to each intern. But there are only $n$ positions in total, while $u$ is unmatched, so this is a contradiction. Therefore, every $u \in U$ is matched.

Now, to see that every intern $v \in V$ must also be matched, we observe that since $M$ is a matching and every position $u \in U$ has been matched, by counting, we must have that all $n$ interns are also matched. Therefore, Gale-Shapley produces a perfect matching. $$\tag*{$\Box$}$$

Proposition 1.10. The Gale-Shapley algorithm produces a stable matching.

Proof. Suppose for contradiction that there is an unstable pair $(u,v)$ with respect to the matching $M$ produced by Gale-Shapley. In other words, there exist two pairs $(u,v')$ and $(u',v)$ such that $u$ prefers $v$ to $v'$ and $v$ prefers $u$ to $u'$.

Since $(u,v)$ is not in $M$, there are two scenarios that can have occurred: either $u$ did not make an offer to $v$ at all or $u$ made an offer to $v$ which was then later swapped.

If $u$ did not make an offer to $v$, this means that $u$ prefers $v'$ to $v$, since $u$ makes offers in decreasing order of preference. Therefore, $(u,v)$ is not unstable.

If $u$ did make an offer to $v$, this means that at some point $v$ either rejected the offer immediately or traded up later. That means that $v$ prefers $u'$ to $u$ and so $(u,v)$ is not unstable.

Thus, we have shown in both cases that $(u,v)$ is not actually unstable. $$\tag*{$\Box$}$$

Putting all of the propositions we proved together, we have the following algorithm.

Theorem 1.11. The Gale-Shapley algorithm will terminate and produce a stable matching for any instance.

Gale and Shapley were economists and game theorists. For this algorithm, they won the Nobel Prize in Economics. This happens to be the closest thing we have to a Nobel in computer science (because there's no Nobel Prize for Mathematics since Alfred Nobel didn't think mathematics was that important).

There's more!

As with many of the problems we want to solve and the algorithms that solve them, there are still a lot of questions that we can ask in terms of how we might be able to adapt our solution or some other properties of the particular solution we came up with.

For instance, does each instance ($U$,$V$, and their preferences) have a unique stable matching? And the natural follow-up to that question is whether Gale-Shapley gives us the same stable matching.

One of the things that complicates this question is that, with the way the algorithm is written, it's not clear that the algorithm will do the same thing. The condition of the while loop implicitly chooses some $u$ which is free, but it doesn't tell us which one to choose. You can imagine that the choice of $u$ in this step will change the outcome of the algorithm.

The surprising thing is that despite this, the algorithm is guaranteed to give us the same matching $M$, no matter how we choose $u$ at each iteration of the while loop. To see this, we need to analyze some properties of the matching $M$ a bit further.

Definition 1.12. We say an intern $v$ is a valid partner for position $u$ if there exists a stable matching that contains $(u,v)$. The best valid partner for a position $u$ is the intern $v$ for which $v$ is valid and no intern $v'$ who is ranked higher than $v$ by $u$ is a valid partner for $u$.

We'll show that Gale-Shapley always matches positions with their best valid intern.

Proposition 1.13. Gale-Shapley produces a matching $M$ that matches every position $u$ to its best valid partner.

Proof. We'll show this by contradiction. Suppose there is an execution of the Gale-Shapley algorithm that produces a matching $M$ that contains a pair $(s,t)$ such that $t$ isn't the best valid partner for $s$.

Since positions give offers out in decreasing order of preference, this means that a position is rejected by a valid partner. Let $u$ be the first position that gets such a rejection, from its best valid intern $v$. Then $(u',v) \in M$ for some position $u'$.

Since $v$ is the best valid partner for $u$, there must be a stable matching $M'$ that contains $(u,v)$. Since $M'$ is a stable matching, we have $(u',v') \in M'$ for some other intern $v' \neq v$. This means that in the execution of the Gale-Shapley algorithm, $v$ chose $u'$ over $u$ and since interns only trade up, we can conclude that $v$ prefers $u'$ to $u$.

Then, we observe that when $v$ rejects $u$, $u'$ has not been rejected by any valid partner, since $v$ rejecting $u$ is the first such rejection. Since $v'$ is a valid partner for $u'$, this means that $u'$ must have made an offer to $v$ before $v'$. Thus, $u'$ prefers $v$ to $v'$.

But then $(u',v)$ form an unstable pair with respect to $M'$, which contradicts our assumption that $M'$ is stable. Therefore, $M$ matches every position to its best valid partner. $$\tag*{$\Box$}$$

One consequence of this is that one group is always privileged in that their preferences will always be maximized at the expense of the other group. In other words, positions will always be matched with their highest ranked valid intern, while interns are always matched with their least preferred valid position.

Proposition 1.14. Gale-Shapley produces a matching $M$ that matches every intern $v$ to its worst valid partner.

Proof. We'll show this by contradiction. Suppose that $M$ doesn't give every intern their worst valid position, and let $(u,v) \in M$ be such that $u$ is not $v$'s worst valid partner. Then there is a stable matching $M'$ that contains a pair $(u',v)$, where $v$ prefers $u'$ less than $u$. In other words, $v$ prefers $u$ to $u'$.

Then in $M'$, $u$ is paired with some other intern $v'$. But by Proposition 1.13, $M$ matches every position with its best valid partner, so $u$ prefers $v$ to $v'$. Then $(u,v)$ must be an unstable pair in $M'$, which is a contradiction. $$\tag*{$\Box$}$$

This leads us to the idea that just because an algorithm is optimal in some context doesn't necessarily mean it's the best one for the job all the time!

Another thing we can do once we have an algorithmic solution is to modify the problem and see if we can use our algorithm to solve that problem, perhaps with a bit of modification. For instance, our current scenario assumes that every position can potentially be matched with any intern, but what if an intern doesn't want to move to the Bay Area? Or what if we would rather match employers with interns rather than positions, and allow an employer to take multiple interns? Or, what if there are fewer positions than there are interns?

Generally speaking, we now have a basic outline of the approach we'll take to coming up with algorithmic solutions to problems:

  1. Define the problem. We are often given a problem from which we need to extract a mathematically precise problem.
  2. Design an algorithm to solve the problem.
  3. Analyze the algorithm. We do this to show that our algorithm is correct and efficient. We may also show that our solution has some particular properties.