CMSC 27200 — Lecture 9

Minimum spanning trees

This time, we'll be dealing with undirected graphs. Recall that a spanning tree is sort of a minimal connected subgraph for a connected graph.

Let $G = (V,E)$ be an undirected graph. Then a subgraph $H = (V',E')$ of $G$ is a spanning tree if $V' = V$ and $H$ is acyclic and connected (i.e. $H$ is a tree).

If we apply weights to the edges like we did with shortest paths, we get the following notion.

Given a connected undirected graph $G = (V,E)$ with a edge weight function $w:E \to \mathbb R^+$, a minimum spanning tree $T = (V,E')$ of $G$ is a spanning tree of $G$ such the sum of the edge weights is minimized.

Consider the following graph.

A minimum spanning tree for this graph is given in green.

Just as the spanning tree represents a sort of minimal subgraph that remains connected, the minimum spanning tree is the least cost version of that. So imagine that we want to minimize the cost of travelling between all of these cities or rolling out connections.

The minimum spanning tree problem is: Given a graph $G = (V,E)$ with edge weight function $w : E \to \mathbb R^+$, find a minimum spanning tree $T$ for $G$.

One of the neat things about this problem is that we can actually remove the language about spanning trees and ask the simpler question of finding the least cost connected subgraph. The answer naturally leads to a minimum spanning tree (you can try to convince yourself of this fact).

We will look at two algorithms for the minimum spanning tree problem, both of which take a greedy approach. For simplicity, we will assume that no two edges have the same weight.

Prim's algorithm

Prim's algorithm is named for Robert Prim, who proposed this algorithm in 1957 while he was at Bell Labs. However, the algorithm was previously published by Vojtěch Jarník in 1930 in Czech. I can relate, having unknowingly rediscovered a result that was previously published only in Hungarian.

Prim's algorithm is a natural starting point for us, because it turns out the idea behind it is very similar to Dijkstra's algorithm for finding shortest paths. We can roughly describe the algorithm as follows:

  1. Begin with a source vertex $s$.
  2. Maintain a set $S$ of explored vertices, which constitute a partially constructed tree.
  3. At each step, add the vertex $v \in V - S$ such that $u \in S$ and $(u,v)$ is a least weight edge.
  4. Grow the tree until $S = V$.

We'll leave the pseudocode to when we talk about implementation and running time. For now, let's convince ourselves that this strategy actually works. What isn't clear at this point is that doing this either gets us a minimum weight spanning tree or even a spanning tree at all. We'll need a few definitions to aid us here.

A cycle is a path with no repeated vertices or edges other than the start and end vertex.

A very important thing to remember about trees is that they are graphs that don't have any cycles. Recall the following useful properties of trees.

Let $T$ be a tree with $n$ vertices.

  1. $T$ is acyclic (i.e. $T$ does not contain any cycles).
  2. $T$ has exactly $n-1$ edges.
  3. Every edge $e$ in $T$ is a bridge (i.e. removing any single edge of $T$ will increase the number of connected components of the graph).
  4. For any edge $e$ not in $T$, adding $e$ to $T$ will create a unique cycle

Consider the following tree.

One can verify that it has exactly 11 edges and it is easy to see that removing any single edge will disconnect the graph. Some examples of possible edges that can be added between vertices are given in different colours and the cycles they create are highlighted.

Now, we'll formalize the idea of separating the explored vs unexplored vertices.

A cut is a partition of vertices into two nonempty subsets $(S,V-S)$. The cutset of a cut $S$ is the set of edges with exactly one endpoint in $S$.

In other words, the cutset of a cut is the set of edges that crosses the cut.

Here, we have a graph given in purple. The vertices in green are those that form the cut and the edges in blue are the edges that belong to its cutset.

We will prove a property of edges in the cutset that will be useful in proving the correctness of both of the algorithms for minimum spanning trees.

For a graph $G = (V,E)$ let $S \subseteq V$ be a cut and let $e$ be a minimum weight edge in the cutset of $S$. Then every minimum spanning tree of $G$ contains $e$.

Suppose that $T$ is a spanning tree that doesn't contain $e = (u,v)$. Since $T$ is a tree, there is a unique path from $u$ to $v$. Since $(u,v)$ is in the cutset of $S$, there must be an edge $e'$ in the cutset of $S$ on the unique path from $u$ to $v$.

Consider the graph $T' = T - \{e'\} \cup \{e\}$, obtained by exchanging $e'$ for $e$. Then $T'$ is also a spanning tree. To see this, recall that the addition of an edge, in this case $e$, creates a unique cycle, but we have broken that cycle by removing $e'$, so $T'$ has no cycles. The graph $T'$ is also connected, since any pair of vertices connected by a path that contained $e'$ is now connected by a path that contains $e$.

However, we were given that $e$ is the minimum weight edge in the cutset of $S$. Since $e'$ was another edge in the cutset of $S$, we have $w(e) \lt w(e')$. Therefore, the total cost of $T'$ is strictly less than the cost of $T$.

The proof of the proposition we just proved is essentially an exchange argument. We will employ this in the following proof.

Let $G$ be a graph. Prim's algorithm produces a minimum spanning tree for $G$.

Essentially what we need to show is that the set of edges $T$ that was chosen by Prim's algorithm together with $S$ are a minimum spanning tree for the subgraph induced by $S$. This then proves the proposition when $S = V$. We will show this by induction on $|S|$.

When $|S| = 1$, $S = \{v\}$ and no edges have been selected. So $T$ is empty and $(S,T)$ is a minimum spanning tree for the subgraph induced by $S$.

Now suppose that for $|S| \geq 1$ that $(S,T)$ is a minimum spanning tree for the subgraph of $G$ induced by $S$. Let $v$ be the next vertex to be added to $S$ by Prim's algorithm. Then by definition there is a vertex $u \in S$ such that $(u,v) \in E$ is the minimum weight edge in the cutset of $S$. By Proposition 8.3, every minimum spanning tree of $G$ must contain this edge.

Now, let's discuss the implementation. Our strategy to implement this will be roughly the same as for Dijkstra's algorithm: keep the vertices that aren't explored yet in a priority queue, based on the minimum weight edge that connects them to a vertex that has been explored.

    \begin{algorithmic}
    \PROCEDURE{Prim}{$G$}
        \STATE Let $r$ be some vertex in $V$
        \FORALL{$v \in V$}
            \STATE $\pi[u] = \varnothing$, $d[u] = \infty$
        \ENDFOR
        \STATE $d[r] = 0$
        \STATE Create priority queue $Q$ with $Q = V$
        \WHILE{$Q \neq \emptyset$}
            \STATE $u = $ \CALL{extract-min}{$Q$}
            \FORALL{$(u,v) \in E$}
                \IF{$v \in Q$ and $w((u,v)) \lt d[v]$}
                    \STATE \CALL{update}{$Q,v,w((u,v))$}
                    \STATE $\pi[v] = u$
                    \STATE $d[v] = w(u,v)$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm gives us the following running time.

Prim's algorithm has a running time of $O(|E| \log |V|)$.

We won't go through the proof, since it's basically the proof for Dijkstra's algorithm (another good exercise would be to adapt that proof for this algorithm). And just like Dijkstra's algorithm, we can get an improvement to $O(|E| + |V| \log |V|)$ time if we swap out a binary heap for a Fibonacci heap.