CMSC 27200 — Lecture 10

Kruskal's algorithm

Kruskal's algorithm is the other commonly-cited minimum spanning tree algorithm and is due to Joseph Kruskal in 1956, curiously also at Bell Labs. Kruskal was a student here in the College and had two brothers who also turned out to be mathematicians as well as a nephew who is currently a CS prof at UMD.

While Prim's algorithm can be thought of as growing a tree, we can think of Kruskal's algorithm as starting with a bunch of tiny trees and joining them together to form one big tree. Again, the rule that will guide us is by taking the minimal weight edge.

    \begin{algorithmic}
    \PROCEDURE{Kruskal}{$G$}
        \STATE Sort edges by cost and label so that $w(e_1) \leq w(e_2) \leq \cdots \leq w(e_m)$
        \STATE $T = \emptyset$
        \FORALL{$v \in V$}
            \STATE Make set $\{v\}$
        \ENDFOR
        \FOR{$i$ from 1 to $m$}
            \STATE Let $e_i$ be $(u,v)$
            \IF{$u$ and $v$ are not in the same set}
                \STATE $T = T \cup \{e_i\}$
                \STATE Take the union of the sets containing $u$ and $v$
            \ENDIF
        \ENDFOR
    \RETURN{$T$}
    \ENDPROCEDURE
    \end{algorithmic}

Let's examine what this algorithm is doing. We begin by sorting our edges in order of weight and putting every vertex into its own "tree". Then, we examine the minimum weight edge. If this edge joins two vertices that are not in the same tree, then we add it to our minimum spanning tree $T$ and take the union of the trees that $u$ and $v$ belong to. We repeatedly do this until we have examined every edge.

The following is a graph that is in the middle of executing Kruskal's algorithm. The larger components created by Kruskal's algorithm are highlighted in different colours.

the edge of weight 6 which is highlighted is the next edge to be added to the tree. Observe that this will join two components.

Using Proposition 8.3, it is not too difficult to prove the correctness of Kruskal's algorithm.

Kruskal's algorithm produces a minimum spanning tree $T$.

First, we will show that this algorithm produces a spanning tree. Suppose first that it isn't a tree, so there is a cycle in the graph. But we couldn't have added an edge that forms a cycle, because this would mean that we added an edge that joins two vertices in the same tree. Now suppose that it isn't connected. But then there would be an edge that connected two trees together that we should have added.

Now, we need to show that $T$ is a minimum spanning tree. Consider an edge of $T$, say $(u,v)$, at the time this edge was chosen by Kruskal's algorithm. Let $S$ be the component of $T$ containing $u$. Since $u \in S$ and $v \not \in S$, adding $(u,v)$ would not have created a cycle. Furthermore, $(u,v)$ must be the first edge considered by Kruskal's algorithm that is in the cutset of $S$. We know this because all such edges would not create a cycle when added, so any other edge in the cutset that would have been encountered earlier by the algorithm would have been added already. Therefore, $(u,v)$ is the least cost edge in the cutset of $S$. By Proposition 8.3, every minimum spanning tree must contain $(u,v)$. Since the choice of $(u,v)$ was arbitrary, every edge in $T$ is contained by every minimum spanning tree.

Thus, $T$ is a minimum spanning tree.

Implementing Kruskal's algorithm

Again, we are faced with the question of how to efficiently implement an algorithm and again, we will be making use of a fancy data structure.

Here, we are faced with a bit of a different problem than we encountered with Dijkstra's or Prim's algorithms. Instead of maintaining a set of explored vertices, we need to keep track of multiple sets and manipulate these sets. Looking back at the algorithm, we have three things we need to be able to do:

One important property that we can take advantage of is the fact that these sets are going to be disjoint. This is helpful, because we know that an element can only belong to exactly one set.

The problem of managing multiple disjoint sets with these operations turns out to be a problem of its own, called the Union-Find problem, first studied by Galler and Fisher in 1964. This problem is closely related to what we're doing with Kruskal's algorithm, which is managing multiple connected components of a graph.

Ultimately, to solve this problem, we need a data structure to represent disjoint sets. Such a data structure has the following operations:

Spookily, these correspond exactly to our operations above.

One question is how to "find" a set. Typically, we distinguish sets by choosing a "representative" of the set. So if $u$ and $v$ are in the same set and $u$ is the representative for that set, then find-set(u) and find-set(v) will both return $u$.

Here is a simple (but not optimal) implementation of this data structure: use linked lists to keep track of these sets and maintain pointers to each element. The nodes for each element will contain the element and a pointer to another element that belongs to the same set. If the node contains a null pointer, then it is the representative for that set.

Based on this description, it is easy to see that make-set can be performed in $O(1)$ time: just make a list with a single node. It is also not too hard to see how to perform a union: just link the representative of a set to the representative of another set. In principle, this requires $O(1)$ time, since we just change a pointer, but this assumes that we have located the representatives of both sets.

This question is linked to the remaining operation: find. One realizes that if we apply unions arbitrarily, we can end up with situations where find can require $O(n)$ time. One way to deal with this is to enforce a priority on which element becomes the representative of a set. One can show that by making the representative of the larger of the two sets the new representative for the union of the two sets, we can guarantee that find has a running time of $O(\log n)$. This is accomplished by storing an additional piece of information, the size of the set.

This gives us the following running times for our operations:

Now, we can return to analyzing Kruskal's algorithm. Luckily, we don't really need to rewrite our algorithm since the Union-Find data structure uses the same operations we've used in the algorithm above.

Kruskal's algorithm has running time $O(m \log n)$, where $m = |E|$ and $n = |V|$.

First, we sort the edges by weight, which takes $O(m \log m)$ time. Then, we perform $n$ make-set operations, which takes $O(n)$ time.

We then loop and examine each edge. For each edge, we check whether its endpoints are in the same set. If they are, we add the edge to our tree and take the union of the two sets. In this case, for each edge, we perform two find operations and at most one union operation. Since union has $O(1)$ time and find has $O(\log n)$ time, and there are $m$ edges, the total of these operations is $O(m \log n)$ time.

This gives us a total of $O(m \log m) + O(n) + O(m \log n)$ time. However, note that $m \leq n^2$ and $\log m \leq 2 \log n$. Therefore, Kruskal's algorithm has a running time of $O(m \log n)$.

Since we need to sort our edges, our current Union-Find data structure suffices. Even if we were able to improve the time complexity for the Union-Find by bringing the cost of find under $O(\log n)$, the running time would still be dominated by the time required to sort.

However, it is interesting to note that there is a very easy improvement we can make to our Union-Find data structure. The analysis for this is discussed in CLRS Chapter 21 and KT 4.6. The idea is that a calling find can take up to $O(\log n)$ time, but for an arbitrary element, this always reaches the same representative. But recall that these pointers are not fixed: we can change them! Here's the trick: when we perform a find on an element $u$, we take the opportunity to change its pointer (and all maybe even all the pointers of the nodes on the path that we traverse) so it is pointing directly to the representative for the set. This way, the first call can be as bad as $O(\log n)$, but subsequent calls will be $O(1)$.

The result is that performing a series of $m$ operations on a Union-Find data structure of $n$ elements has a running time of $O(m \cdot \alpha(n))$.

Here, $\alpha$ is the inverse Ackermann function. The Ackermann function is a function that grows very, very fast, which means its inverse, $\alpha$, grows incredibly slowly. It grows so slowly that for practical purposes, it may as well be constant: for all realistic values of $n$, we have $\alpha(n) \leq 4$. This is because $a(4)$ is on the order of $2^{2^{2^{2^{2^{2^{2}}}}}}$. This number is a number that Wolfram|Alpha refuses to write out because it is too big. You know this is very big, because Wolfram|Alpha is willing to write out numbers like "the number of atoms in the universe".

The result is that we have a Union-Find data structure for which operations are pretty much as close as you can get to being constant time without being able to say that they're really in constant time. This running time was proved by Tarjan and van Leeuwen in 1975. A natural question is whether this is really the best we can do; i.e. can we get this thing down to $O(1)$? The answer turns out to be no, you really do need that $\Omega(\alpha(n))$ factor, which was shown by Fredman and Saks in 1984.

This value shows up in other places in algorithms analysis. For instance, the current fastest known algorithm for computing the minimum spanning tree, due to Chazelle in 2000, has a running time of $O(m \cdot \alpha(n))$. Funnily enough, this algorithm doesn't use Union-Finds, but relies on a data structure called a soft heap.