CMSC 27200 — Lecture 15

Bipartite matching

You may recall from discrete math or the beginning of our class that a basic problem on bipartite graphs is matching. Let's dig up some definitions from the first week and discrete math.

Definition 15.1. A matching in an undirected graph $G = (V,E)$ is a subset of edges $M \subseteq E$ such that no two edges in $E$ share the same incident vertex. A vertex that is incident to an edge in a matching $M$ is matched in $M$; otherwise, it is unmatched. A maximal matching is a matching that cannot be made larger; in other words, every edge in $E \setminus M$ is incident to a vertex matched in $M$. A maximum matching is a matching with the maximum number of edges.

Example 15.2. The following graph has a matching consisting of the edges highlighted in yellow. It is a maximal matching (no more edges can be added to the matching) and a maximum matching (it is the largest matching that this graph will admit).

As the example shows, matchings can be applied to any undirected graph, not just those that are bipartite. We will focus on matching in bipartite graphs. This is the kind of graph that we were looking at in the context of stable matchings, but here we have no concept of stability and the graphs are not taken to be complete. We never formally defined bipartite graphs, so here is the definition.

Definition 15.3. An undirected graph $G = (V,E)$ is bipartite if $V$ is the disjoint union of two sets $V_1$ and $V_2$ such that all edges in $E$ connect a vertex in $V_1$ with a vertex in $V_2$. We say that $(V_1, V_2)$ is a bipartition of $V$.

We are interested in the following problem.

Problem 15.4. The bipartite matching problem is: Given an undirected graph $G = (V,E)$ with bipartition $(V_1,V_2)$, find a maximum matching $M \subseteq E$.

Those of you who happened to take discrete math with me might begin to have flashbacks to when we were talking about matchings and augmenting paths, and maybe you were already having those flashbacks when we happened to start talking about augmenting paths in the context of flows, a very eerie coincidence.

What we will try to do is turn our instance of the bipartite matching problem into an instance of the maximum flow problem. In other words, we will transform our graph into a flow network. In doing so, if we find a maximum flow on our flow network, then the flow corresponds to a maximum matching on our bipartite graph.

Definition 15.5. Let $G = (V,E)$ be an undirected graph with bipartition $(V_1,V_2)$. We define the flow network $G' = (V',E',s,t,c)$ by the following:

$V' = V \cup \{s,t\}$,
$E' = \{(u,v) \mid u \in V_1, v \in V_2, (u,v) \in E\} \cup \{(s,u) \mid u \in V_1\} \cup \{(v,t) \mid v \in V_2\}$,
$c(u,v) = 1$ for all edges $(u,v) \in E'$.

Informally, we turn all edges into directed edges from vertices in $V_1$ to vertices in $V_2$. We add a source vertex $s$ with edges from $s$ to vertices in $V_1$ and add a sink vertex $t$ and edges from vertices in $V_2$ to $t$. Finally, we set the capacities of all edges to 1.

We want to show the following.

Theorem 15.6. Each matching $M$ of cardinality $k$ corresponds to an integral flow $f$ with value $k$.

Proof. The forward direction is pretty simple. let $M$ be a matching of size $k$. Consider the flow $f$ that pushes 1 unit of flow through each edge of $M$. Since each edge of $M$ is incident to exactly one vertex in $V_1$ and one vertex in $V_2$, we can easily verify that this flow obeys the capacity and conservation conditions and that $v(f) = k$.

Now consider the reverse direction, and suppose $f$ is an integer-valued flow of value $k$. Clearly, flows through edges must be either 0 or 1. We will denote by $M'$ the set of edges $e$ that have $f(e) = 1$.

We claim that $M'$ has $k$ edges. To see this, we take the cut $(S,T)$ where $S = \{s\} \cup V_1$ and $T = \{t\} \cup V_2$. Applying Lemma 14.5, we get that the flow leaving $S$ is $k$ and no flow is entering $S$. Since every edge has capacity 1, we can conclude there are $k$ edges in $M'$.

Now, to show that $M'$ is a matching, we observe that no two edges carrying flow can be incident to the same vertex, as every vertex in $V_1$ has an incoming capacity of 1 and every vertex in $V_2$ has an outgoing capacity of 1. $$\tag*{$\Box$}$$

Then an algorithm for finding the maximum matching of a bipartite graph is clear: given a bipartite graph $G$, construct the flow network $G'$ as described in Definition 15.5 and find a maximum flow for $G'$. Is this efficient? It turns out that even in the simplest incarnation of Ford-Fulkerson, we get a feasible algorithm.

Theorem 15.7. A maximum matching in a bipartite graph can be found in $O(mn)$ time.

Recall that the running time of Ford-Fulkerson was $O(mC)$, where $C$ was the upper bound on the capacity of the graph. This was potentially bad if $C$ was much larger than our graph, but here, $C = n$, which makes this perfectly fine.

One of the things about this result is that it actually gives us a way to solve bipartite matching without having to transform our graph into a flow network. First, let's suppose we have a flow $f$ in our network $G'$ that's not maximum. Then there is an augmenting path in our residual graph $G_f'$. But since all of the edges in our network have capacity 1, any augmenting path will be made up of alternating forward and backward edges. This leads to a situation like the following.

An augmentation of the flow will result in us essentially swapping edges of the augmenting path so that those that were carrying flow will no longer do so and those that were empty will have flow. In our graph, this appears as though we've swapped the edges of our matching.

If you happened to take discrete math with me, then you will recognize this as the algorithm I presented for bipartite matching. The idea is very similar to the iteratively-improving solution technique used for maximum flows:

Start with an empty matching.
Look for an augmenting path; in the context of matchings, these are paths that begin and end with unmatched edges and alternate between matched and unmatched edges.
Flip the edges of the path in/out of the matching. This creates a larger matching.
Continue doing this until there are no augmenting paths left.

In the following example, we take the yellow edges to be our initial matching. Then, there are two possible augmenting paths, one in cyan and one in green. Flipping the edges in and out of the matching in both paths will increase the size of our matching.

Of course, now, we recognize this clearly as a special case of the Ford-Fulkerson algorithm.

A natural question is to ask whether we can pull this off with a graph that is not bipartite. The answer to that is not quite. General graphs will have an inconvenient structure that makes the algorithm less straightforward. However, there is a polynomial time algorithm for computing maximum matchings in graphs, due to Edmonds in 1965. We won't be looking at this algorithm, but Edmonds says something very interesting in this paper.

In particular, one of the claims of his paper is that this algorithm is efficient ($O(n^4)$, in fact), but remember that this is 1965, so the question of what an efficient algorithm is not quite settled. His claim is that because his algorithm increases not exponentially, but algebraically with respect to the size of the graph, it can be considered efficient. Here, algebraically of course means polynomially and he discusses the implications of this definition in Section 2.

My alma mater is the University of Waterloo, which was where Edmonds was a faculty member, in the Department of Combinatorics and Optimization, so the first time I heard about Edmonds was with respect to his matching algorithm and his discussion of efficient algorithms. The second time I heard about him was in grad school, from an officemate, who had also gone to school at Waterloo, who told me that Edmonds had a rock outside of his home that was inscribed with $NP \neq NP \cap coNP = P$. His page on Wikipedia was recently updated to include a photo of him with his rock outside his home.