MPCS 55001 — Lecture 14

Minimum cuts

It remains to show that the Ford-Fulkerson algorithm actually gives us the maximum possible flow. We know a rough upper bound on the flow: the total capacity of all edges leaving the source $s$. However, this is not the best upper bound we can get.

Recall the following definition from our discussion of minimum spanning trees.

Definition 5.8. A cut is a partition of vertices into two nonempty subsets $(S,V-S)$. The cutset of a cut $S$ is the set of edges with exactly one endpoint in $S$.

Suppose we have a cut in our flow network, with $s$ in one set of the partition and $t$ in the other. We observe that the edges in the cutset that move from the $s$ side to the $t$ side constitutes a rough bound on the amount of flow that goes from $s$ to $t$. We can formalize this notion.

An $s,t$-cut is a partition $(S,T)$ of the vertex set $V$ such that $s \in S$ and $t \in T$.

One observation that we can make in the context of flows is that a cut that divides $s$ and $t$ into two different parts of the cut means that flow will necessarily travel from one side of the cut to the other. We can try to quantify some aspects of this.

The capacity of a cut $(S,T)$ is the capacity of all edges leaving $S$, \[c(S,T) = \sum_{(u,v) \in S \times T \subseteq E} c(u,v).\]

Consider the following graph. The vertices highlighted in yellow are in the set $S$, and we consider the cut $(S,T)$. Then the capacity of this cut is the capacity of edges highlighted in cyan, $3+1+5 = 9$.

Note that the capacity of the cut is much lower than the capacity of the edges leaving the source node $s$. Intuitively, it seems like any flow is subject to the capacity of a cut. After all, all flow has to travel from one side of the cut to the other. But we can go further with this: it seems like this should be true for any cut we can define, which leads us to the following problem.

The minimum cut problem is: Given a flow network $G = (V,E,s,t,c)$, find a cut $(S,T)$ with minimum capacity.

If our intuition is correct, then finding a minimum cut will give us a possibly better upper bound on the maximum flow than the outgoing capacity of the source vertex.

Let $f$ be a flow and $(S,T)$ be a cut. Then the value of a flow is the flow leaving $S$ less the flow entering $S$, \[v(f) = \sum_{u \in S, v \in T, (u,v) \in E} f(u,v) - \sum_{u \in T, v \in S, (u,v) \in E} f(u,v).\]

Since there are no edges entering $s$, we have \[v(f) = \sum_{(s,v) \in E} f(s,v) - \sum_{(v,s) \in E} f(v,s) = \sum_{(s,v) \in E} f(s,v).\] By conservation, we have for all $v \in V$, \[\sum_{(u,v) \in E} f(u,v) - \sum_{(v,w) \in E} f(v,w) = 0.\] Then putting the two together, we have \begin{align*} v(f) &= \sum_{(s,v) \in E} f(s,v) - \sum_{(v,s) \in E} f(v,s) \\ &= \sum_{(s,v) \in E} f(s,v) - \sum_{(v,s) \in E} f(v,s) + 0 \\ &= \sum_{v \in S} \left( \sum_{(u,v) \in E} f(u,v) - \sum_{(v,w) \in E} f(v,w)\right) \\ &= \sum_{u \in S, v \in T, (u,v) \in E} f(u,v) - \sum_{u \in T, v \in S, (u,v) \in E} f(u,v). \end{align*}

Consider the following flow network with cut $(S,T)$, where vertices in $S$ are highlighted in yellow and vertices in $T$ are highlighted in green. The capacities of this network are in pink, while the flow is labeled in blue. We observe that the value of the flow is 9.

The flow leaving $S$ is circled in yellow, for a total of $2+4+9$, while the flow entering $S$ is circled in blue, which totals $6$. The difference in these flows is $2+4+9-6 = 9$, which is the value of our flow.

Of course, the in-flow of $S$ is the out-flow of $T$ and the out-flow of $S$ is the in-flow of $T$, so this lemma can be reformulated in terms of $T$. This lemma leads to the following bound.

Let $f$ be a flow and $(S,T)$ be a cut. Then $v(f) \leq c(S,T)$.

\begin{align*} v(f) &= \sum_{u \in S, v \in T, (u,v) \in E} f(u,v) - \sum_{u \in T, v \in S, (u,v) \in E} f(u,v) \\ &\leq \sum_{u \in S, v \in T, (u,v) \in E} f(u,v) \\ &\leq \sum_{u \in S, v \in T, (u,v) \in E} c(u,v) \\ &\leq c(S,T) \end{align*}

This bound confirms our intutition above, that any cut places an upper bound on the flow for a given network. This means that the minimal cut for a network will give us the best upper bound on the maximum flow through that network.

Let $f$ be a flow and $(S,T)$ be a cut. If $v(f) = c(S,T)$, then $f$ is a max flow and $(S,T)$ is a min cut.

We observe that for any flow $f'$, we have $v(f') \leq c(S,T) = v(f)$. We also have that for any cut $(S',T')$, $c(S',T') \geq v(f) = c(S,T)$.

This leads us to the statement of the Max-Flow Min-Cut theorem.

The value of a maximum flow is the capacity of a minimum cut.

We will use this to show that the Ford-Fulkerson algorithm is correct, by showing that we can compute the minimum cut from it.

Let $G = (V,E,s,t,c)$ be a flow network. Then $f$ is a maxmimum flow if and only if $G_f$ contains no augmenting paths.

We will show that the following are equivalent.

  1. There exists a cut $(S,T)$ such that $c(S,T) = v(f)$.
  2. $f$ is a maximum flow.
  3. There is no augmenting path in $G_f$.

$(1) \implies (2)$ is easy: this is just Corollary 14.8.

$(2) \implies (3)$ is also fairly straightforward. Suppose $G_f$ does have an augmenting path. Then we can augment $f$ and increase the flow, so $f$ wasn't a maximum flow.

So it remains to show that $(3) \implies (1)$. We have that $f$ is a flow such that $G_f$ has no augmenting paths. Then this means that not every vertex in $G_f$ is reachable from $s$, so we let $S$ be the set of vertices reachable from $s$ in $G_f$. This naturally gives us a cut $(S,T)$, since it is clear that $t \not \in S$ (otherwise, there would be an augmenting path in $G_f$).

Now, let's consider the flow on the edges in the cutset of $(S,T)$. If $(u,v)$ is an edge in $G$ with $u \in S$ and $v \in T$. This edge is not a forward edge in $G_f$, since otherwise, $v$ would have been reachable and $v \in S$. Since $(u,v)$ is not a forward edge, this means $(u,v)$ had no unused capacity by $f$, so $f(u,v) = c(u,v)$.

Next, consider an edge $(u',v')$ in $G$ with $u' \in T$ and $v' \in S$. We claim that $f(u',v') = 0$. Suppose not. Then $(v',u')$ is a backward edge in $G_f$ and $u'$ would have been reachable by $s$, which could not be the case since $u' \in T$.

Therefore, all outgoing edges of $S$ are saturated with flow and all incoming edges of $S$ are empty. By the flow value lemma, we have \begin{align*} v(f) &= \sum_{u \in S, v \in T, (u,v) \in E} f(u,v) - \sum_{u \in T, v \in S, (u,v) \in E} f(u,v) \\ &= \sum_{u \in S, v \in T, (u,v) \in E} c(u,v) - 0 \\ &= c(S,T). \end{align*}

Since the Ford-Fulkerson algorithm relies on obtaining the flow for which its residual graph contains no augmenting paths, we can conclude that the flow that the Ford-Fulkerson algorithm returns is the maximum flow. But this theorem also gives us a way to compute the minimum cut of $G$.

Let $G = (V,E,s,t,c)$ be a flow network. Given a maximum flow $f$ for $G$, we can compute a minimum $s,t$-cut in $O(m)$ time.

First, we compute the residual graph $G_f$ in $O(m)$ time. Then doing a search, we can find all vertices that are reachable from $s$ in $O(m+n) = O(m)$ time.

The Max-Flow Min-Cut theorem was a part of the Ford-Fulkerson paper. But why were they interested in maximal flows? They begin their paper with the following:

The problem discussed in this paper was formulated by T. Harris as follows:

"Consider a rail network connecting two cities by way of a number of intermediate cities, where each link of the network has a number assigned to it representing its capacity. Assuming a steady state condition, find a maximal flow from one given city to the other."

This sounds innocuous and kind of boring until you remember that Ford and Fulkerson were working at RAND and you realize that the rail network referred to by Harris is the Soviet rail system and maybe they weren't necessarily as interested in maximum flows as minimum cuts.

Bipartite matching

You may recall from discrete math or the beginning of our class that a basic problem on bipartite graphs is matching. Let's dig up some definitions from the first week and discrete math.

A matching in an undirected graph $G = (V,E)$ is a subset of edges $M \subseteq E$ such that no two edges in $E$ share the same incident vertex. A vertex that is incident to an edge in a matching $M$ is matched in $M$; otherwise, it is unmatched. A maximal matching is a matching that cannot be made larger; in other words, every edge in $E \setminus M$ is incident to a vertex matched in $M$. A maximum matching is a matching with the maximum number of edges.

The following graph has a matching consisting of the edges highlighted in yellow. It is a maximal matching (no more edges can be added to the matching) and a maximum matching (it is the largest matching that this graph will admit).

As the example shows, matchings can be applied to any undirected graph, not just those that are bipartite. We will focus on matching in bipartite graphs. This is the kind of graph that we were looking at in the context of stable matchings, but here we have no concept of stability and the graphs are not taken to be complete. We never formally defined bipartite graphs, so here is the definition.

An undirected graph $G = (V,E)$ is bipartite if $V$ is the disjoint union of two sets $V_1$ and $V_2$ such that all edges in $E$ connect a vertex in $V_1$ with a vertex in $V_2$. We say that $(V_1, V_2)$ is a bipartition of $V$.

We are interested in the following problem.

The bipartite matching problem is: Given an undirected graph $G = (V,E)$ with bipartition $(V_1,V_2)$, find a maximum matching $M \subseteq E$.

Those of you who happened to take discrete math with me might begin to have flashbacks to when we were talking about matchings and augmenting paths, and maybe you were already having those flashbacks when we happened to start talking about augmenting paths in the context of flows, a very eerie coincidence.

What we will try to do is turn our instance of the bipartite matching problem into an instance of the maximum flow problem. In other words, we will transform our graph into a flow network. In doing so, if we find a maximum flow on our flow network, then the flow corresponds to a maximum matching on our bipartite graph.

Let $G = (V,E)$ be an undirected graph with bipartition $(V_1,V_2)$. We define the flow network $G' = (V',E',s,t,c)$ by the following:

Informally, we turn all edges into directed edges from vertices in $V_1$ to vertices in $V_2$. We add a source vertex $s$ with edges from $s$ to vertices in $V_1$ and add a sink vertex $t$ and edges from vertices in $V_2$ to $t$. Finally, we set the capacities of all edges to 1.

We want to show the following.

Each matching $M$ of cardinality $k$ corresponds to an integral flow $f$ with value $k$.

The forward direction is pretty simple. let $M$ be a matching of size $k$. Consider the flow $f$ that pushes 1 unit of flow through each edge of $M$. Since each edge of $M$ is incident to exactly one vertex in $V_1$ and one vertex in $V_2$, we can easily verify that this flow obeys the capacity and conservation conditions and that $v(f) = k$.

Now consider the reverse direction, and suppose $f$ is an integer-valued flow of value $k$. Clearly, flows through edges must be either 0 or 1. We will denote by $M'$ the set of edges $e$ that have $f(e) = 1$.

We claim that $M'$ has $k$ edges. To see this, we take the cut $(S,T)$ where $S = \{s\} \cup V_1$ and $T = \{t\} \cup V_2$. Applying Lemma 14.5, we get that the flow leaving $S$ is $k$ and no flow is entering $S$. Since every edge has capacity 1, we can conclude there are $k$ edges in $M'$.

Now, to show that $M'$ is a matching, we observe that no two edges carrying flow can be incident to the same vertex, as every vertex in $V_1$ has an incoming capacity of 1 and every vertex in $V_2$ has an outgoing capacity of 1.

Then an algorithm for finding the maximum matching of a bipartite graph is clear: given a bipartite graph $G$, construct the flow network $G'$ as described in Definition 14.15 and find a maximum flow for $G'$. Is this efficient? It turns out that even in the simplest incarnation of Ford-Fulkerson, we get a feasible algorithm.

A maximum matching in a bipartite graph can be found in $O(mn)$ time.

Recall that the running time of Ford-Fulkerson was $O(mC)$, where $C$ was the upper bound on the capacity of the graph. This was potentially bad if $C$ was much larger than our graph, but here, $C = n$, which makes this perfectly fine.

One of the things about this result is that it actually gives us a way to solve bipartite matching without having to transform our graph into a flow network. First, let's suppose we have a flow $f$ in our network $G'$ that's not maximum. Then there is an augmenting path in our residual graph $G_f'$. But since all of the edges in our network have capacity 1, any augmenting path will be made up of alternating forward and backward edges. This leads to a situation like the following.

An augmentation of the flow will result in us essentially swapping edges of the augmenting path so that those that were carrying flow will no longer do so and those that were empty will have flow. In our graph, this appears as though we've swapped the edges of our matching.

If you ever happened to take discrete math with me, then you will recognize this as the algorithm I presented for bipartite matching, due to Claude Berge in 1957. The idea is very similar to the iteratively-improving solution technique used for maximum flows:

  1. Start with an empty matching.
  2. Look for an augmenting path; in the context of matchings, these are paths that begin and end with unmatched edges and alternate between matched and unmatched edges.
  3. Flip the edges of the path in/out of the matching. This creates a larger matching.
  4. Continue doing this until there are no augmenting paths left.

In the following example, we take the yellow edges to be our initial matching. Then, there are two possible augmenting paths, one in cyan and one in green. Flipping the edges in and out of the matching in both paths will increase the size of our matching.

Of course, now, we recognize this clearly as a special case of the Ford-Fulkerson algorithm.

A natural question is to ask whether we can pull this off with a graph that is not bipartite. The answer to that is not quite. General graphs will have an inconvenient structure that makes the algorithm less straightforward. However, there is a polynomial time algorithm for computing maximum matchings in graphs, due to Edmonds in 1965. We won't be looking at this algorithm, but Edmonds says something very interesting in this paper.

In particular, one of the claims of his paper is that this algorithm is efficient ($O(n^4)$, in fact), but remember that this is 1965, so the question of what an efficient algorithm is not quite settled. His claim is that because his algorithm increases not exponentially, but algebraically with respect to the size of the graph, it can be considered efficient. Here, algebraically of course means polynomially and he discusses the implications of this definition in Section 2.

My alma mater is the University of Waterloo, which was where Edmonds was a faculty member, in the Department of Combinatorics and Optimization, so the first time I heard about Edmonds was with respect to his matching algorithm and his discussion of efficient algorithms. The second time I heard about him was in grad school, from an officemate, who had also gone to school at Waterloo, who told me that Edmonds had a rock outside of his home that was inscribed with $NP \neq NP \cap coNP = P$. His page on Wikipedia was recently updated to include a photo of him with his rock outside his home.