CMSC 27200 — Lecture 13

Network flow

We will now examine a class of problems related to the idea of network flow. This seems a bit specific compared to the broader techniques that we've been looking at, but the interesting thing about all of these problems is that we can't actually solve them using any of our prior techniques. Rather, they operate under a principle of starting with a simple and not necessarily optimal solution, and iteratively refine and improve that solution until it is optimal.

All of the problems we'll see are based on the idea of flow through a graph.

Definition 13.1. A flow network is a tuple $G = (V,E,s,t,c)$, where $(V,E)$ is a digraph with a source $s \in v$ and a sink $t \in V$ and every edge $e \in E$ has a capacity $c(e) \geq 0$.

To simplify things, we make the following assumptions.

Example 13.2. Here, we have a flow network with edges labelled with their capacities.

Intuitively, we want to consider the flow of some material through our graph from $s$ to $t$. Here, material is abstract, but can represent things like traffic or liquid. Let's formally define what flow is.

Definition 13.3. An $s,t$-flow is a function $f:E \to \mathbb N$ that satisfies

  1. capacity: for all edges $e \in E$, $0 \leq f(e) \leq c(e)$;
  2. conservation: for all vertices $v \in V - \{s,t\}$, \[\sum_{e = (u,v) \in E} f(e) = \sum_{e = (v,w) \in E} f(e).\] In other words, the flow going into a vertex must be the same as the flow leaving the vertex.

We say that the value of a flow $f$ is \[v(f) = \sum_{e = (s,v) \in E} f(e).\]

Example 13.4. Here, we have our network from Example 13.2, with flow applied as labelled.

One thing to note here is that this is clearly not the most "stuff" that we can put through this network. This leads us to our problem.

Problem 13.5. The maximum flow problem is: Given a flow network $G = (V,E,s,t,c)$, find a flow $f$ of maximum value.

A seemingly natural approach for this is to use a greedy algorithm: just look at the edges with the largest capacities and measure the flow travelling on those edges. Then if we have any leftover flow, we can redirect it appropriately across the rest of the network.

Example 13.6. Here, we have our working example after applying a "greedy" strategy of finding the largest path to saturate our flow.

We first send 8 flow through path 1 and the remaining 2 through path 2. We then try to send as much as we can through path 3. This results in a flow of 16.

However, the following is a better flow.

The main problem with this approach is the same one that fails for a lot of attempts via greedy algorithms: sometimes the greedy choice isn't the best choice and we have to go back and make some change.

Ford-Fulkerson

If we think about how liquids or impatient drivers behave, they don't just give up when they are blocked at some point while there's still some unused capacity available. They will naturally redistribute and push back to where there is available capacity. So we need some way of formalizing this behaviour of redirecting flow.

Definition 13.7. The residual graph $G_f = (V,E_f)$ for a flow network $G$ and flow $f$ is defined for each edge $e = (u,v) \in E$,

Then the edge set $E_f$ of the residual graph $G_f$ is defined \[E_f = \{e = (u,v) \mid f(e) \lt c(e)\} \cup \{e' = (v,u) \mid f(e) \gt 0\},\] with capacity function $c_f$ defined as above, \[c(u,v) = \begin{cases} c(u,v) - f(u,v) & \text{if $(u,v) \in E$,} \\ f(u,v) & \text{if $(v,u) \in E$,} \\ 0 & \text{otherwise.} \end{cases}\]

Informally, the residual graph with respect to a particular flow $f$ gives us the capacity of the edges to get pushed backwards. If we push a particular amount of flow through our network through certain edges, how does the rest of the network absorb the flow if there isn't enough capacity? We call the capacity $c_f$ of the residual graph $G_f$ the residual capacity.

Example 13.8. Here, we have a residual graph derived from the flow of Example 13.4.

So how do we make use of this information? Because the residual graph tells us where there is unused capacity in our flow network, we can use it to find a way to redirect the flow for a given path.

Definition 13.9. An augmenting path is a path in the residual graph $G_f$. We say the bottleneck capacity $\beta(P,f)$ of an augmenting path $P$ with respect to the flow $f$ is the minimum residual capacity of any edge in $P$.

What we would like to do is take a path that's been bottlenecked and figure out how the excess flow gets redirected. We call this process augmentation, because it results in determining an augmenting path.

Definition 13.10. The augmented flow $\alpha(f,P)$ is a flow in $G$, defined for each edge $(u,v) \in E_f$ by \[ \alpha(f,P)(u,v) = \begin{cases} f(u,v) + \beta(P,f) & \text{if $(u,v) \in P$ and $(u,v) \in E$,} \\ f(u,v) - \beta(P,f) & \text{if $(u,v) \in P$ and $(v,u) \in E$,} \\ f(u,v) & \text{if $(u,v) \not \in P$.} \end{cases} \]

Example 13.11. Here, we augment our flow based on the residual graph of Example 13.8. We choose the following augmented path.

Then applying our augmentation with respect to this path, we see that the bottleneck is 2 and adjust our flow accordingly.

Based on our augmenting path, we get a new flow. Of course, we don't know that this is a flow; we have to show that it obeys our conservation and capacity conditions.

Proposition 13.12. The augmented flow $\alpha(f,P)$ is a flow in $G$.

Proof. First, we will show that the capacity condition is preserved. We note that the only edges for which the flow has changed is on edges of $P$, so let $(u,v) \in P$. Either $(u,v)$ is a forward or a backward edge.

If $(u,v)$ is a forward edge, we observe that we increase the flow of each forward edge by $\beta(P,f)$, which was the defined to be the minimum residual capacity of any edge in $P$. Then \[0 \leq f(u,v) \leq \alpha(f,P)(u,v) = f(u,v) + \beta(P,f) \leq f(u,v) + (c(u,v) - f(u,v)) = c(u,v).\]

If $(u,v)$ is a backwards edge, we observe that we decrease the flow of each backward edge by $\beta(P,f)$. This gives us \[c(u,v) \geq f(u,v) \geq \alpha(f,P)(u,v) = f(u,v) - \beta(P,f) \geq f(u,v) - f(u,v) = 0.\] Therefore, in either case the capacity condition holds.

Then, we must show that the conservation condition holds for each vertex in our path except for $s,t$. This is done by considering the each combination of forward or backwards edges entering and exiting. We will do one case and leave the rest for you to think through.

Suppose that for our vertex $v$, we have a forward edge $(u,v)$ entering it and a backwards edge $(v,w)$ leaving it. According to $\alpha(f,P)$, we have \begin{align*} \alpha(f,P)(u,v) &= f(u,v) + \beta(P,f) \\ \alpha(f,P)(w,v) &= f(w,v) - \beta(P,f) \end{align*} Since $(v,w)$ is a backwards edge in $G_f$, its corresponding edge in $G$ is $(w,v)$, so both $(u,v)$ and $(w,v)$ enter $v$ in $G$. Then $\alpha(f,P)$ increases the amount of flow entering $v$ by $\beta(P,f)$ on $(u,v)$ but decreases the flow entering $v$ by $\beta(P,f)$ on $(w,v)$. Since $f$ preserved conservation, we can conclude $\alpha(f,P)$ preserves conservation. $$\tag*{$\Box$}$$

But what can we say about the new flow? As it turns out, the new flow is going to be strictly greater than our previous flow.

Lemma 13.13 Let $f$ be a flow in $G$ and let $P$ be a simple $s,t$-path in the residual graph $G_f$. Then $v(f') = v(f) + \beta(P,f)$. Since $\beta(P,f) \gt 0$, we have $v(f') \gt v(f)$.

Proof. Let $e$ be the first edge of $P$. Then $e$ must leave $s$ and since we are considering only simple paths, $s$ does not get revisited. Since $G$ does not contain any edges that enter $s$, $e$ must be a forward edge. When we augment $f$ on $P$, we increase the flow on $e$ by $\beta(P,f)$ and no other edge leaving $s$. Therefore, the value of $f'$ is greater than the value of $f$ by $\beta(P,f)$. $$\tag*{$\Box$}$$

This leads us to the basic idea of the Ford-Fulkerson algorithm. Since augmenting our flows increases the flow in $G$, we can start with 0 flow and repeatedly compute augmenting flows. We stop once we are unable to do so.

    \begin{algorithm}
    \caption{Ford-Fulkerson}
    \begin{algorithmic}
    \PROCEDURE{maxflow}{$G = (V,E,s,t,c)$}
        \FORALL{$e \in E$}
            \STATE $f(e) \gets 0$
        \ENDFOR
        \STATE $G_f \gets$ residual graph of $G$ and $f$
        \WHILE{there is an $s,t$-path $P$ in $G_f$}
            \STATE $f = \alpha(f,P)$
            \STATE $G_f \gets$ residual graph of $G$ and $f$
        \ENDWHILE
        \RETURN{$f$}
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

The Ford-Fulkerson algorithm is due to Ford, who we were introduced to via Bellman-Ford, and Fulkerson in 1956, when they were at RAND. The actual algorithm that they give is more of the math-y proof kind. This will be important when we discuss the running time in a bit.

One thing that is unclear about this is, like the very first algorithm we saw, the Gale-Shapley algorithm, it's not clear that this algorithm will ever terminate. What if we keep computing flows indefinitely? We already know that we will always get a flow with greater value. All that remains is to observe that every graph has a strict upper bound on the value of any flow: the capacities of the edges directly leaving the source.

Proposition 13.15. Let $G = (V,E,s,t,c)$ be a flow network with $c:E \to \mathbb N$. Then the Ford-Fulkerson algorithm will terminate on $G$.

Proof. We observe that the value of any flow cannot exceed the capacity of all edges leaving the source $s$. Let $C = \sum_{(s,v) \in E} f(s,v)$. By Lemma 13.12, the value of any augmented flow $\alpha(f,P)$ will be strictly larger than the flow $f$ that it augments. Therefore, at each iteration of Ford-Fulkerson, the value of the flow maintained by the algorithm increases by at least 1. Since the value of the increases with each iteration and the upper bound on the value of any flow of $G$ is $C$, we can conclude that Ford-Fulkerson will terminate after at most $C$ iterations. $$\tag*{$\Box$}$$

Now, we can consider its running time.

Proposition 13.16. Let $G = (V,E,s,t,c)$ be a flow network with $c:E \to \mathbb N$. Then the Ford-Fulkerson algorithm runs in $O(mC)$ time, where $m = |E|$ and $C = \sum_{(s,v) \in E} f(s,v)$.

Proof. Let $m = |E|$ and $n = |V|$ and assume that $m \geq \frac n 2$. By Proposition 13.15, the algorithm must terminate after at most $C$ iterations. Therefore, we can consider the time required for each iteration.

Since each iteration requires $O(m)$ time, the algorithm has running time $O(mC)$ in the worst case. Then we observe that the residual graph $G_f$ has at most $2m$ edges; one forwards and one backwards for each edge of $G$. This means that constructing a residual graph can be performed in $O(m)$ time. To find an $s,t$-path, we simply do a search in $O(m+n) = O(m)$ time. Then finding the augmented flow $\alpha(f,P)$ requires $O(n)$ time, since there are at most $n-1$ edges for any path $P$.

Therefore, one iteration requires $O(m)$ time, which brings us to our total of $O(mC)$ time. $$\tag*{$\Box$}$$

One problem with this algorithm is that it is not necessarily polynomial. A bad case of this would be a flow network with an absurdly large bound of $C$ relative to the size of the graph that updates the flow by increasing by 1 with each iteration. This has to do with the fact that there's no prescribed way to choose an augmenting path.

This goes back to the discussion of Ford and Fulkerson's presentation of the algorithm, which leaves out the how for finding the augmenting path. Without such a definition, we are free to choose the worst case. However, an actual fleshed-out efficient algorithm was given by Edmonds and Karp in 1972 and this algorithm has a running time of $O(|V||E|^2)$.