MPCS 55001 — Lecture 5

Minimum spanning trees

This time, we'll be dealing with undirected graphs. Recall that a spanning tree is sort of a minimal connected subgraph for a connected graph.

Let $G = (V,E)$ be an undirected graph. Then a subgraph $H = (V',E')$ of $G$ is a spanning tree if $V' = V$ and $H$ is acyclic and connected (i.e. $H$ is a tree).

If we apply weights to the edges like we did with shortest paths, we get the following notion.

Given a connected undirected graph $G = (V,E)$ with a edge weight function $w:E \to \mathbb R^+$, a minimum spanning tree $T = (V,E')$ of $G$ is a spanning tree of $G$ such the sum of the edge weights is minimized.

Consider the following graph.

A minimum spanning tree for this graph is given in green.

Just as the spanning tree represents a sort of minimal subgraph that remains connected, the minimum spanning tree is the least cost version of that. So imagine that we want to minimize the cost of travelling between all of these cities or rolling out connections.

The minimum spanning tree problem is: Given a graph $G = (V,E)$ with edge weight function $w : E \to \mathbb R^+$, find a minimum spanning tree $T$ for $G$.

One of the neat things about this problem is that we can actually remove the language about spanning trees and ask the simpler question of finding the least cost connected subgraph. The answer naturally leads to a minimum spanning tree (you can try to convince yourself of this fact).

We will look at two algorithms for the minimum spanning tree problem, both of which take a greedy approach. For simplicity, we will assume that no two edges have the same weight.

Prim's algorithm

Prim's algorithm is named for Robert Prim, who proposed this algorithm in 1957 while he was at Bell Labs. However, the algorithm was previously published by Vojtěch Jarník in 1930 in Czech. I can relate, having unknowingly rediscovered a result that was previously published only in Hungarian.

Prim's algorithm is a natural starting point for us, because it turns out the idea behind it is very similar to Dijkstra's algorithm for finding shortest paths. We can roughly describe the algorithm as follows:

  1. Begin with a source vertex $s$.
  2. Maintain a set $S$ of explored vertices, which constitute a partially constructed tree.
  3. At each step, add the vertex $v \in V - S$ such that $u \in S$ and $(u,v)$ is a least weight edge.
  4. Grow the tree until $S = V$.

We'll leave the pseudocode to when we talk about implementation and running time. For now, let's convince ourselves that this strategy actually works. What isn't clear at this point is that doing this either gets us a minimum weight spanning tree or even a spanning tree at all. We'll need a few definitions to aid us here.

A cycle is a path with no repeated vertices or edges other than the start and end vertex.

A very important thing to remember about trees is that they are graphs that don't have any cycles. Recall the following useful properties of trees.

Let $T$ be a tree with $n$ vertices.

  1. $T$ is acyclic (i.e. $T$ does not contain any cycles).
  2. $T$ has exactly $n-1$ edges.
  3. Every edge $e$ in $T$ is a bridge (i.e. removing any single edge of $T$ will increase the number of connected components of the graph).
  4. For any edge $e$ not in $T$, adding $e$ to $T$ will create a unique cycle

Consider the following tree.

One can verify that it has exactly 11 edges and it is easy to see that removing any single edge will disconnect the graph. Some examples of possible edges that can be added between vertices are given in different colours and the cycles they create are highlighted.

We will also formalize the separation of the explored vs unexplored vertices.

A cut is a partition of vertices into two nonempty subsets $(S,V-S)$. The cutset of a cut $S$ is the set of edges with exactly one endpoint in $S$.

In other words, the cutset of a cut is the set of edges that crosses the cut.

Here, we have a graph given in purple. The vertices in green are those that form the cut and the edges in blue are the edges that belong to its cutset.

We will prove a property of edges in the cutset that will be useful in proving the correctness of both of the algorithms for minimum spanning trees.

For a graph $G = (V,E)$ let $S \subseteq V$ be a cut and let $e$ be a minimum weight edge in the cutset of $S$. Then every minimum spanning tree of $G$ contains $e$.

Suppose that $T$ is a spanning tree that doesn't contain $e = (u,v)$. Since $T$ is a tree, there is a unique path from $u$ to $v$. Since $(u,v)$ is in the cutset of $S$, there must be an edge $e'$ in the cutset of $S$ on the unique path from $u$ to $v$.

Consider the graph $T' = T - \{e'\} \cup \{e\}$, obtained by exchanging $e'$ for $e$. Then $T'$ is also a spanning tree. To see this, recall that the addition of an edge, in this case $e$, creates a unique cycle, but we have broken that cycle by removing $e'$, so $T'$ has no cycles. The graph $T'$ is also connected, since any pair of vertices connected by a path that contained $e'$ is now connected by a path that contains $e$.

However, we were given that $e$ is the minimum weight edge in the cutset of $S$. Since $e'$ was another edge in the cutset of $S$, we have $w(e) \lt w(e')$. Therefore, the total cost of $T'$ is strictly less than the cost of $T$.

The proof of the proposition we just proved is essentially an exchange argument. We will employ this in the following proof.

Let $G$ be a graph. Prim's algorithm produces a minimum spanning tree for $G$.

Essentially what we need to show is that the set of edges $T$ that was chosen by Prim's algorithm together with $S$ are a minimum spanning tree for the subgraph induced by $S$. This then proves the proposition when $S = V$. We will show this by induction on $|S|$.

When $|S| = 1$, $S = \{v\}$ and no edges have been selected. So $T$ is empty and $(S,T)$ is a minimum spanning tree for the subgraph induced by $S$.

Now suppose that for $|S| \geq 1$ that $(S,T)$ is a minimum spanning tree for the subgraph of $G$ induced by $S$. Let $v$ be the next vertex to be added to $S$ by Prim's algorithm. Then by definition there is a vertex $u \in S$ such that $(u,v) \in E$ is the minimum weight edge in the cutset of $S$. By Proposition 5.10, every minimum spanning tree of $G$ must contain this edge.

Now, let's discuss the implementation. Our strategy to implement this will be roughly the same as for Dijkstra's algorithm: keep the vertices that aren't explored yet in a priority queue, based on the minimum weight edge that connects them to a vertex that has been explored.

    \begin{algorithmic}
    \PROCEDURE{Prim}{$G$}
        \STATE Let $r$ be some vertex in $V$
        \FORALL{$v \in V$}
            \STATE $\pi[u] = \varnothing$, $d[u] = \infty$
        \ENDFOR
        \STATE $d[r] = 0$
        \STATE Create priority queue $Q$ with $Q = V$
        \WHILE{$Q \neq \emptyset$}
            \STATE $u = $ \CALL{extract-min}{$Q$}
            \FORALL{$(u,v) \in E$}
                \IF{$v \in Q$ and $w((u,v)) \lt d[v]$}
                    \STATE \CALL{update}{$Q,v,w((u,v))$}
                    \STATE $\pi[v] = u$
                    \STATE $d[v] = w((u,v))$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm gives us the following running time.

Prim's algorithm has a running time of $O(|E| \log |V|)$.

We won't go through the proof, since it's basically the proof for Dijkstra's algorithm (another good exercise would be to adapt that proof for this algorithm). And just like Dijkstra's algorithm, we can get an improvement to $O(|E| + |V| \log |V|)$ time if we swap out a binary heap for a Fibonacci heap.

Kruskal's algorithm

Kruskal's algorithm is the other commonly-cited minimum spanning tree algorithm and is due to Joseph Kruskal in 1956, curiously also at Bell Labs. Kruskal was a student here in the College and had two brothers who also turned out to be mathematicians as well as a nephew who is currently a CS prof at UMD.

While Prim's algorithm can be thought of as growing a tree, we can think of Kruskal's algorithm as starting with a bunch of tiny trees and joining them together to form one big tree. Again, the rule that will guide us is by taking the minimal weight edge.

    \begin{algorithmic}
    \PROCEDURE{Kruskal}{$G$}
        \STATE Sort edges by cost and label so that $w(e_1) \leq w(e_2) \leq \cdots \leq w(e_m)$
        \STATE $T = \emptyset$
        \FORALL{$v \in V$}
            \STATE Make set $\{v\}$
        \ENDFOR
        \FOR{$i$ from 1 to $m$}
            \STATE Let $e_i$ be $(u,v)$
            \IF{$u$ and $v$ are not in the same set}
                \STATE $T = T \cup \{e_i\}$
                \STATE Take the union of the sets containing $u$ and $v$
            \ENDIF
        \ENDFOR
    \RETURN{$T$}
    \ENDPROCEDURE
    \end{algorithmic}

Let's examine what this algorithm is doing. We begin by sorting our edges in order of weight and putting every vertex into its own "tree". Then, we examine the minimum weight edge. If this edge joins two vertices that are not in the same tree, then we add it to our minimum spanning tree $T$ and take the union of the trees that $u$ and $v$ belong to. We repeatedly do this until we have examined every edge.

The following is a graph that is in the middle of executing Kruskal's algorithm. The larger components created by Kruskal's algorithm are highlighted in different colours.

the edge of weight 6 which is highlighted is the next edge to be added to the tree. Observe that this will join two components.

Using Proposition 5.10, it is not too difficult to prove the correctness of Kruskal's algorithm.

Kruskal's algorithm produces a minimum spanning tree $T$.

First, we will show that this algorithm produces a spanning tree. Suppose first that it isn't a tree, so there is a cycle in the graph. But we couldn't have added an edge that forms a cycle, because this would mean that we added an edge that joins two vertices in the same tree. Now suppose that it isn't connected. But then there would be an edge that connected two trees together that we should have added.

Now, we need to show that $T$ is a minimum spanning tree. Consider an edge of $T$, say $(u,v)$, at the time this edge was chosen by Kruskal's algorithm. Let $S$ be the component of $T$ containing $u$. Since $u \in S$ and $v \not \in S$, adding $(u,v)$ would not have created a cycle. Furthermore, $(u,v)$ must be the first edge considered by Kruskal's algorithm that is in the cutset of $S$, since otherwise, it would have been added already. Therefore, $(u,v)$ is the least cost edge in the cutset of $S$. By Proposition 5.10, every minimum spanning tree must contain $(u,v)$. Since the choice of $(u,v)$ was arbitrary, every edge in $T$ is contained by every minimum spanning tree.

Thus, $T$ is a minimum spanning tree.

Implementing Kruskal's algorithm

Again, we are faced with the question of how to efficiently implement an algorithm and again, we will be making use of a fancy data structure.

Here, we are faced with a bit of a different problem than we encountered with Dijkstra's or Prim's algorithms. Instead of maintaining a set of explored vertices, we need to keep track of multiple sets and manipulate these sets. Looking back at the algorithm, we have three things we need to be able to do:

One important property that we can take advantage of is the fact that these sets are going to be disjoint. This is helpful, because we know that an element can only belong to exactly one set.

The problem of managing multiple disjoint sets with these operations turns out to be a problem of its own, called the Union-Find problem, first studied by Galler and Fisher in 1964. This problem is closely related to what we're doing with Kruskal's algorithm, which is managing multiple connected components of a graph.

Ultimately, to solve this problem, we need a data structure to represent disjoint sets. Such a data structure has the following operations:

Spookily, these correspond exactly to our operations above.

One question is how to "find" a set. Typically, we distinguish sets by choosing a "representative" of the set. So if $u$ and $v$ are in the same set and $u$ is the representative for that set, then find-set(u) and find-set(v) will both return $u$.

Here is a simple (but not optimal) implementation of this data structure: use linked lists to keep track of these sets and maintain pointers to each element. The nodes for each element will contain the element and a pointer to another element that belongs to the same set. If the node contains a null pointer, then it is the representative for that set.

Based on this description, it is easy to see that make-set can be performed in $O(1)$ time: just make a list with a single node. It is also not too hard to see how to perform a union: just link the representative of a set to the representative of another set. In principle, this requires $O(1)$ time, since we just change a pointer, but this assumes that we have located the representatives of both sets.

This question is linked to the remaining operation: find. One realizes that if we apply unions arbitrarily, we can end up with situations where find can require $O(n)$ time. One way to deal with this is to enforce a priority on which element becomes the representative of a set. One can show that by making the representative of the larger of the two sets the new representative for the union of the two sets, we can guarantee that find has a running time of $O(\log n)$. This is accomplished by storing an additional piece of information, the size of the set.

This gives us the following running times for our operations:

Now, we can return to analyzing Kruskal's algorithm. Luckily, we don't really need to rewrite our algorithm since the Union-Find data structure uses the same operations we've used in the algorithm above.

Kruskal's algorithm has running time $O(m \log n)$, where $m = |E|$ and $n = |V|$.

First, we sort the edges by weight, which takes $O(m \log m)$ time. Then, we perform $n$ make-set operations, which takes $O(n)$ time.

We then loop and examine each edge. For each edge, we check whether its endpoints are in the same set. If they are, we add the edge to our tree and take the union of the two sets. In this case, for each edge, we perform two find operations and at most one union operation. Since union has $O(1)$ time and find has $O(\log n)$ time, and there are $m$ edges, the total of these operations is $O(m \log n)$ time.

This gives us a total of $O(m \log m) + O(n) + O(m \log n)$ time. However, note that $m \leq n^2$ and $\log m \leq 2 \log n$. Therefore, Kruskal's algorithm has a running time of $O(m \log n)$.

Since we need to sort our edges, our current Union-Find data structure suffices. Even if we were able to improve the time complexity for the Union-Find by bringing the cost of find under $O(\log n)$, the running time would still be dominated by the time required to sort.

However, it is interesting to note that there are Union-Find data structures that are capable of such improvement. The implementation and analysis of an optimal data structure for Union-Find is discussed in CLRS Chapter 21 and KT 4.6. The details are complicated, but the resulting running time is interesting.

The result is that performing a series of $m$ operations on a Union-Find data structure of $n$ elements has a running time of $O(m \cdot \alpha(n))$.

Here, $\alpha$ is the inverse Ackermann function. The Ackermann function is a function that grows very, very fast, which means its inverse, $\alpha$, grows incredibly slowly. It grows so slowly that for practical purposes, it may as well be constant: for all realistic values of $n$, we have $\alpha(n) \leq 4$. This is because $a(4)$ is on the order of $2^{2^{2^{2^{2^{2^{2}}}}}}$. This number is a number that Wolfram|Alpha refuses to write out because it is too big. You know this is very big, because Wolfram|Alpha is willing to write out numbers like "the number of atoms in the universe".

The result is that we have a Union-Find data structure for which operations are pretty much as close as you can get to being constant time without being able to say that they're really in constant time. This running time was proved by Tarjan and van Leeuwen in 1975. A natural question is whether this is really the best we can do; i.e. can we get this thing down to $O(1)$? The answer turns out to be no, you really do need that $\Omega(\alpha(n))$ factor, which was shown by Fredman and Saks in 1984.

This value shows up in other places in algorithms analysis. For instance, the current fastest known algorithm for computing the minimum spanning tree, due to Chazelle in 2000, has a running time of $O(m \cdot \alpha(n))$. Funnily enough, this algorithm doesn't use Union-Finds, but relies on a data structure called a soft heap.