CMSC 27200 — Lecture 4

Analyzing Gale-Shapley

Let's walk through an example of how to do this kind of complexity analysis. Luckily, we've just looked at an algorithm. Returning to Gale-Shapley, it's time to ask the following question: how efficient is it? We'll work towards showing the following.

Gale-Shapley has worst-case time complexity $O(n^2)$.

If we look at the algorithm and recall Proposition 1.7, we can conclude that the while loop will get executed $O(n^2)$ times. What we would need to show to prove Theorem 2.13 is that the sequence of operations within the while loop amount to $O(1)$ cost.

First, we assume that each position and intern is indexed by $1, \dots, n$. We can imagine that we have an array positions[] of positions and an array interns[] of interns, with positions[i] holding an array of preferences. So position[i][j] = k would mean that the $j$th most preferred intern for position $i$ is intern $k$.

First, let's think about what the algorithm does on each iteration.

  1. Determine if there is an unmatched position $u$.
  2. Find the first intern $v$ who hasn't been made an offer by $u$.
  3. Determine if $v$ is unmatched.
  4. Determine if $v$ prefers $u$ or their current match.
  5. Update the matching $M$, if necessary.

To show that each of these amounts to a constant number of operations, one of the first issues that we have to deal with is think about how things are represented in our algorithm. For Gale-Shapley, we can make use of arrays and lists to do everything we need.

First, to determine whether there are unmatched positions, we keep all unmatched positions in a list. Since it doesn't matter what order they come in (which we showed in Proposition 1.13), we can just use any old list, or if we'd like to get fancier, we can use a queue. Either way, we just need to know whether there are unmatched positions and we want to be able to insert (if we're replacing a pair in the matching and some other position becomes free) or delete (if our current free position gets matched) in $O(1)$ time, so lists are perfect for that.

Secondly, to find the next intern that a position gives an offer to, we simply need to keep track of which intern was made an offer last by each position. Since we're given the preference lists, we just need to keep track of all of this, possibly in an array next_intern[]. Then this step can be done in $O(1)$ time since array lookups and updates are $O(1)$.

To check if $v$ is unmatched, we need a way to keep track of our matching. We can do this by using an array matching[]. We take advantage of the fact that we never really need to query the matching based on the position, so we have matching[i] = j for intern $i$ and position $j$ if $(j,i) \in M$ and matching[i] = 0 if intern $i$ hasn't been matched. This takes care of updating the matching as well, since updates will be $O(1)$.

The most difficult part is how to determine if $v$ prefers $u$ when being made an offer. Suppose $(u',v) \in M$. To determine if $v$ prefers $u$ over $u'$, we would have to find $u$ and $u'$ in the preference list for $v$, which could take $O(n)$ time by searching through the array holding $v$'s preferences.

What we can do to avoid this is to construct a lookup for each intern $v$ that will tell us which rank each position $u$ occupies. This is basically the inverse of the preference list, where we query based on the position to get the rank. So we construct an array rank[] where rank[i] is an array that contains the inverse of intern $i$'s preference list. That is, rank[i][j] = k means that position $j$ is ranked $k$th by intern $i$.

Constructing this lookup table can be done at the very beginning of the algorithm and takes $\Theta(n^2)$ time. Once we do that, we only need $O(1)$ time to look up the ranking for each position and we can avoid doing a search through preference lists.

In total, our algorithm does the following:

  1. Construct the ranking lookup in $\Theta(n^2)$ time.
  2. Perform $O(n^2)$ offers. Each offer requires $O(1)$ time.

This gives us a total of $O(n^2)$ time complexity.

There's more!

As with many of the problems we want to solve and the algorithms that solve them, there are still a lot of questions that we can ask in terms of how we might be able to adapt our solution or some other properties of the particular solution we came up with.

For instance, does each instance ($U$,$V$, and their preferences) have a unique stable matching? And the natural follow-up to that question is whether Gale–Shapley gives us the same stable matching.

One of the things that complicates this question is that, with the way the algorithm is written, it's not clear that the algorithm will do the same thing. The condition of the while loop implicitly chooses some $u$ which is free, but it doesn't tell us which one to choose. You can imagine that the choice of $u$ in this step will change the outcome of the algorithm.

The surprising thing is that despite this, the algorithm is guaranteed to give us the same matching $M$, no matter how we choose $u$ at each iteration of the while loop. To see this, we need to analyze some properties of the matching $M$ a bit further.

We say an intern $v$ is a valid partner for position $u$ if there exists a stable matching that contains $(u,v)$. The best valid partner for a position $u$ is the intern $v$ for which $v$ is valid and no intern $v'$ who is ranked higher than $v$ by $u$ is a valid partner for $u$.

We'll show that Gale–Shapley always matches positions with their best valid intern.

Gale–Shapley produces a matching $M$ that matches every position $u$ to its best valid partner.

We'll show this by contradiction. Suppose there is an execution of the Gale–Shapley algorithm that produces a matching $M$ that contains a pair $(s,t)$ such that $t$ isn't the best valid partner for $s$.

Since positions give offers out in decreasing order of preference, this means that a position is rejected by a valid partner. Let $u$ be the first position that gets such a rejection, from its best valid intern $v$. Then $(u',v) \in M$ for some position $u'$.

Since $v$ is the best valid partner for $u$, there must be a stable matching $M'$ that contains $(u,v)$. Since $M'$ is a stable matching, we have $(u',v') \in M'$ for some other intern $v' \neq v$. This means that in the execution of the Gale–Shapley algorithm, $v$ chose $u'$ over $u$ and since interns only trade up, we can conclude that $v$ prefers $u'$ to $u$.

Then, we observe that when $v$ rejects $u$, $u'$ has not been rejected by any valid partner, since $v$ rejecting $u$ is the first such rejection. Since $v'$ is a valid partner for $u'$, this means that $u'$ must have made an offer to $v$ before $v'$. Thus, $u'$ prefers $v$ to $v'$.

But then $(u',v)$ form an unstable pair with respect to $M'$, which contradicts our assumption that $M'$ is stable. Therefore, $M$ matches every position to its best valid partner.

One consequence of this is that one group is always privileged in that their preferences will always be maximized at the expense of the other group. In other words, positions will always be matched with their highest ranked valid intern, while interns are always matched with their least preferred valid position.

Gale–Shapley produces a matching $M$ that matches every intern $v$ to its worst valid partner.

We'll show this by contradiction. Suppose that $M$ doesn't give every intern their worst valid position, and let $(u,v) \in M$ be such that $u$ is not $v$'s worst valid partner. Then there is a stable matching $M'$ that contains a pair $(u',v)$, where $v$ prefers $u'$ less than $u$. In other words, $v$ prefers $u$ to $u'$.

Then in $M'$, $u$ is paired with some other intern $v'$. But $M$ matches every position with its best valid partner, so $u$ prefers $v$ to $v'$. Then $(u,v)$ must be an unstable pair in $M'$, which is a contradiction.

Here, we have a striking example of how algorithms can encode bias if we're not careful. Notice that we kind of rolled into this outcome without really thinking about it. Of course, it wasn't clear in the midst of designing this algorithm how our choices would lead to this outcome. However, we were able to figure out that our solution has this property.

This points to the usefulness of algorithm analysis for evaluating properties of the algorithm beyond just correctness and efficiency. This is important as we rely more and more on algorithms to make decisions for us. We should rightly question how those decisions are made and be prepared to give an answer. Rigorously reasoning about algorithms gives us the ability to do so.

In the case of Gale–Shapley, the problem with the algorithm is clear: it privileges employers. This is generally a problem since employers already have tremendous power over employees. However, our analysis also gives us a way to fix this: simply swap the role of the proposers. This simple fix will then privilege the preferences of the students/interns, who tend to have less power in this scenario.