MPCS 55001 — Lecture 3

Minimum spanning trees

This time, we'll be dealing with undirected graphs. Recall that a spanning tree is sort of a minimal connected subgraph for a connected graph.

Let $G = (V,E)$ be an undirected graph. Then a subgraph $H = (V',E')$ of $G$ is a spanning tree if $V' = V$ and $H$ is acyclic and connected (i.e. $H$ is a tree).

If we apply weights to the edges like we did with shortest paths, we get the following notion.

Given a connected undirected graph $G = (V,E)$ with a edge weight function $w:E \to \mathbb R^+$, a minimum spanning tree $T = (V,E')$ of $G$ is a spanning tree of $G$ such the sum of the edge weights is minimized.

Consider the following graph.

A minimum spanning tree for this graph is given in green.

Just as the spanning tree represents a sort of minimal subgraph that remains connected, the minimum spanning tree is the least cost version of that. So imagine that we want to minimize the cost of travelling between all of these cities or rolling out connections.

The minimum spanning tree problem is: Given a graph $G = (V,E)$ with edge weight function $w : E \to \mathbb R^+$, find a minimum spanning tree $T$ for $G$.

One of the neat things about this problem is that we can actually remove the language about spanning trees and ask the simpler question of finding the least cost connected subgraph. The answer naturally leads to a minimum spanning tree (you can try to convince yourself of this fact).

We will look at two algorithms for the minimum spanning tree problem, both of which take a greedy approach. For simplicity, we will assume that no two edges have the same weight.

Prim's algorithm

Prim's algorithm is named for Robert Prim, who proposed this algorithm in 1957 while he was at Bell Labs. However, the algorithm was previously published by Vojtěch Jarník in 1930 in Czech. I can relate, having unknowingly rediscovered a result that was previously published only in Hungarian.

Prim's algorithm is a natural starting point for us, because it turns out the idea behind it is very similar to Dijkstra's algorithm for finding shortest paths. We can roughly describe the algorithm as follows:

  1. Begin with a source vertex $s$.
  2. Maintain a set $S$ of explored vertices, which constitute a partially constructed tree.
  3. At each step, add the vertex $v \in V - S$ such that $u \in S$ and $(u,v)$ is a least weight edge.
  4. Grow the tree until $S = V$.

We'll leave the pseudocode to when we talk about implementation and running time. For now, let's convince ourselves that this strategy actually works. What isn't clear at this point is that doing this either gets us a minimum weight spanning tree or even a spanning tree at all. We'll need a few definitions to aid us here.

A cycle is a path with no repeated vertices or edges other than the start and end vertex.

A very important thing to remember about trees is that they are graphs that don't have any cycles. Recall the following useful properties of trees.

Let $T$ be a tree with $n$ vertices.

  1. $T$ is acyclic (i.e. $T$ does not contain any cycles).
  2. $T$ has exactly $n-1$ edges.
  3. Every edge $e$ in $T$ is a bridge (i.e. removing any single edge of $T$ will increase the number of connected components of the graph).
  4. For any edge $e$ not in $T$, adding $e$ to $T$ will create a unique cycle

Consider the following tree.

One can verify that it has exactly 11 edges and it is easy to see that removing any single edge will disconnect the graph. Some examples of possible edges that can be added between vertices are given in different colours and the cycles they create are highlighted.

We will also formalize the separation of the explored vs unexplored vertices.

A cut is a partition of vertices into two nonempty subsets $(S,V-S)$. The cutset of a cut $S$ is the set of edges with exactly one endpoint in $S$.

In other words, the cutset of a cut is the set of edges that crosses the cut.

Here, we have a graph given in purple. The vertices in green are those that form the cut and the edges in blue are the edges that belong to its cutset.

We will prove a property of edges in the cutset that will be useful in proving the correctness of both of the algorithms for minimum spanning trees.

For a graph $G = (V,E)$ let $S \subseteq V$ be a cut and let $e$ be a minimum weight edge in the cutset of $S$. Then every minimum spanning tree of $G$ contains $e$.

Suppose that $T$ is a spanning tree that doesn't contain $e = (u,v)$. Since $T$ is a tree, there is a unique path from $u$ to $v$. Since $(u,v)$ is in the cutset of $S$, there must be an edge $e'$ in the cutset of $S$ on the unique path from $u$ to $v$.

Consider the graph $T' = T - \{e'\} \cup \{e\}$, obtained by exchanging $e'$ for $e$. Then $T'$ is also a spanning tree. To see this, recall that the addition of an edge, in this case $e$, creates a unique cycle, but we have broken that cycle by removing $e'$, so $T'$ has no cycles. The graph $T'$ is also connected, since any pair of vertices connected by a path that contained $e'$ is now connected by a path that contains $e$.

However, we were given that $e$ is the minimum weight edge in the cutset of $S$. Since $e'$ was another edge in the cutset of $S$, we have $w(e) \lt w(e')$. Therefore, the total cost of $T'$ is strictly less than the cost of $T$.

The proof of the proposition we just proved is essentially an exchange argument. We will employ this in the following proof.

Let $G$ be a graph. Prim's algorithm produces a minimum spanning tree for $G$.

Essentially what we need to show is that the set of edges $T$ that was chosen by Prim's algorithm together with $S$ are a minimum spanning tree for the subgraph induced by $S$. This then proves the proposition when $S = V$. We will show this by induction on $|S|$.

When $|S| = 1$, $S = \{v\}$ and no edges have been selected. So $T$ is empty and $(S,T)$ is a minimum spanning tree for the subgraph induced by $S$.

Now suppose that for $|S| \geq 1$ that $(S,T)$ is a minimum spanning tree for the subgraph of $G$ induced by $S$. Let $v$ be the next vertex to be added to $S$ by Prim's algorithm. Then by definition there is a vertex $u \in S$ such that $(u,v) \in E$ is the minimum weight edge in the cutset of $S$. By Proposition 3.10, every minimum spanning tree of $G$ must contain this edge.

Now, let's discuss the implementation. Our strategy to implement this will be roughly the same as for Dijkstra's algorithm: keep the vertices that aren't explored yet in a priority queue, based on the minimum weight edge that connects them to a vertex that has been explored.

    \begin{algorithmic}
    \PROCEDURE{Prim}{$G$}
        \STATE Let $r$ be some vertex in $V$
        \FORALL{$v \in V$}
            \STATE $\pi[u] = \varnothing$, $d[u] = \infty$
        \ENDFOR
        \STATE $d[r] = 0$
        \STATE Create priority queue $Q$ with $Q = V$
        \WHILE{$Q \neq \emptyset$}
            \STATE $u = $ \CALL{extract-min}{$Q$}
            \FORALL{$(u,v) \in E$}
                \IF{$v \in Q$ and $w((u,v)) \lt d[v]$}
                    \STATE \CALL{update}{$Q,v,w((u,v))$}
                    \STATE $\pi[v] = u$
                    \STATE $d[v] = w((u,v))$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm gives us the following running time.

Prim's algorithm has a running time of $O(|E| \log |V|)$.

We won't go through the proof, since it's basically the proof for Dijkstra's algorithm (another good exercise would be to adapt that proof for this algorithm). And just like Dijkstra's algorithm, we can get an improvement to $O(|E| + |V| \log |V|)$ time if we swap out a binary heap for a Fibonacci heap.

Kruskal's algorithm

Kruskal's algorithm is the other commonly-cited minimum spanning tree algorithm and is due to Joseph Kruskal in 1956, curiously also at Bell Labs. Kruskal was a student here in the College and had two brothers who also turned out to be mathematicians as well as a nephew who is currently a CS prof at UMD.

While Prim's algorithm can be thought of as growing a tree, we can think of Kruskal's algorithm as starting with a bunch of tiny trees and joining them together to form one big tree. Again, the rule that will guide us is by taking the minimal weight edge.

    \begin{algorithmic}
    \PROCEDURE{Kruskal}{$G$}
        \STATE Sort edges by cost and label so that $w(e_1) \leq w(e_2) \leq \cdots \leq w(e_m)$
        \STATE $T = \emptyset$
        \FORALL{$v \in V$}
            \STATE Make set $\{v\}$
        \ENDFOR
        \FOR{$i$ from 1 to $m$}
            \STATE Let $e_i$ be $(u,v)$
            \IF{$u$ and $v$ are not in the same set}
                \STATE $T = T \cup \{e_i\}$
                \STATE Take the union of the sets containing $u$ and $v$
            \ENDIF
        \ENDFOR
    \RETURN{$T$}
    \ENDPROCEDURE
    \end{algorithmic}

Let's examine what this algorithm is doing. We begin by sorting our edges in order of weight and putting every vertex into its own "tree". Then, we examine the minimum weight edge. If this edge joins two vertices that are not in the same tree, then we add it to our minimum spanning tree $T$ and take the union of the trees that $u$ and $v$ belong to. We repeatedly do this until we have examined every edge.

The following is a graph that is in the middle of executing Kruskal's algorithm. The larger components created by Kruskal's algorithm are highlighted in different colours.

the edge of weight 6 which is highlighted is the next edge to be added to the tree. Observe that this will join two components.

Using Proposition 3.10, it is not too difficult to prove the correctness of Kruskal's algorithm.

Kruskal's algorithm produces a minimum spanning tree $T$.

First, we will show that this algorithm produces a spanning tree. Suppose first that it isn't a tree, so there is a cycle in the graph. But we couldn't have added an edge that forms a cycle, because this would mean that we added an edge that joins two vertices in the same tree. Now suppose that it isn't connected. But then there would be an edge that connected two trees together that we should have added.

Now, we need to show that $T$ is a minimum spanning tree. Consider an edge of $T$, say $(u,v)$, at the time this edge was chosen by Kruskal's algorithm. Let $S$ be the component of $T$ containing $u$. Since $u \in S$ and $v \not \in S$, adding $(u,v)$ would not have created a cycle. Furthermore, $(u,v)$ must be the first edge considered by Kruskal's algorithm that is in the cutset of $S$, since any such edge could have been added already without creating a cycle. Therefore, $(u,v)$ is the least cost edge in the cutset of $S$. By Proposition 3.10, every minimum spanning tree must contain $(u,v)$. Since the choice of $(u,v)$ was arbitrary, every edge in $T$ is contained by every minimum spanning tree.

Thus, $T$ is a minimum spanning tree.

Implementing Kruskal's algorithm

Again, we are faced with the question of how to efficiently implement an algorithm and again, we will be making use of a fancy data structure.

Here, we are faced with a bit of a different problem than we encountered with Dijkstra's or Prim's algorithms. Instead of maintaining a set of explored vertices, we need to keep track of multiple sets and manipulate these sets. Looking back at the algorithm, we have three things we need to be able to do:

One important property that we can take advantage of is the fact that these sets are going to be disjoint. This is helpful, because we know that an element can only belong to exactly one set.

The problem of managing multiple disjoint sets with these operations turns out to be a problem of its own, called the Union-Find problem, first studied by Galler and Fisher in 1964. This problem is closely related to what we're doing with Kruskal's algorithm, which is managing multiple connected components of a graph.

Ultimately, to solve this problem, we need a data structure to represent disjoint sets. Such a data structure has the following operations:

Spookily, these correspond exactly to our operations above.

One question is how to "find" a set. Typically, we distinguish sets by choosing a "representative" of the set. So if $u$ and $v$ are in the same set and $u$ is the representative for that set, then find-set(u) and find-set(v) will both return $u$.

Here is a simple (but not optimal) implementation of this data structure: use linked lists to keep track of these sets and maintain pointers to each element. The nodes for each element will contain the element and a pointer to another element that belongs to the same set. If the node contains a null pointer, then it is the representative for that set.

Based on this description, it is easy to see that make-set can be performed in $O(1)$ time: just make a list with a single node. It is also not too hard to see how to perform a union: just link the representative of a set to the representative of another set. In principle, this requires $O(1)$ time, since we just change a pointer, but this assumes that we have located the representatives of both sets.

This question is linked to the remaining operation: find. One realizes that if we apply unions arbitrarily, we can end up with situations where find can require $O(n)$ time. One way to deal with this is to enforce a priority on which element becomes the representative of a set. One can show that by making the representative of the larger of the two sets the new representative for the union of the two sets, we can guarantee that find has a running time of $O(\log n)$. This is accomplished by storing an additional piece of information, the size of the set.

This gives us the following running times for our operations:

Now, we can return to analyzing Kruskal's algorithm. Luckily, we don't really need to rewrite our algorithm since the Union-Find data structure uses the same operations we've used in the algorithm above.

Kruskal's algorithm has running time $O(m \log n)$, where $m = |E|$ and $n = |V|$.

First, we sort the edges by weight, which takes $O(m \log m)$ time. Then, we perform $n$ make-set operations, which takes $O(n)$ time.

We then loop and examine each edge. For each edge, we check whether its endpoints are in the same set. If they are, we add the edge to our tree and take the union of the two sets. In this case, for each edge, we perform two find operations and at most one union operation. Since union has $O(1)$ time and find has $O(\log n)$ time, and there are $m$ edges, the total of these operations is $O(m \log n)$ time.

This gives us a total of $O(m \log m) + O(n) + O(m \log n)$ time. However, note that $m \leq n^2$ and $\log m \leq 2 \log n$. Therefore, Kruskal's algorithm has a running time of $O(m \log n)$.

Since we need to sort our edges, our current Union-Find data structure suffices. Even if we were able to improve the time complexity for the Union-Find by bringing the cost of find under $O(\log n)$, the running time would still be dominated by the time required to sort.

However, it is interesting to note that there are Union-Find data structures that are capable of such improvement. The implementation and analysis of an optimal data structure for Union-Find is discussed in CLRS Chapter 21 and KT 4.6. The details are complicated, but the resulting running time is interesting.

The result is that performing a series of $m$ operations on a Union-Find data structure of $n$ elements has a running time of $O(m \cdot \alpha(n))$.

Here, $\alpha$ is the inverse Ackermann function. The Ackermann function is a function that grows very, very fast, which means its inverse, $\alpha$, grows incredibly slowly. It grows so slowly that for practical purposes, it may as well be constant: for all realistic values of $n$, we have $\alpha(n) \leq 4$. This is because $a(4)$ is on the order of $2^{2^{2^{2^{2^{2^{2}}}}}}$. This number is a number that Wolfram|Alpha refuses to write out because it is too big. You know this is very big, because Wolfram|Alpha is willing to write out numbers like "the number of atoms in the universe".

The result is that we have a Union-Find data structure for which operations are pretty much as close as you can get to being constant time without being able to say that they're really in constant time. This running time was proved by Tarjan and van Leeuwen in 1975. A natural question is whether this is really the best we can do; i.e. can we get this thing down to $O(1)$? The answer turns out to be no, you really do need that $\Omega(\alpha(n))$ factor, which was shown by Fredman and Saks in 1984.

This value shows up in other places in algorithms analysis. For instance, the current fastest known algorithm for computing the minimum spanning tree, due to Chazelle in 2000, has a running time of $O(m \cdot \alpha(n))$. Funnily enough, this algorithm doesn't use Union-Finds, but relies on a data structure called a soft heap.

Divide and conquer

Now, we'll move on to another common algorithm design technique: divide and conquer. Much like "greedy algorithms" this design paradigm is pretty much what it's been named. The idea is to recursively divide a large problem into smaller subproblems repeatedly and then put the pieces back together, so to speak.

The first problem that we will consider is sorting, something we are probably a bit familiar with now.

The sorting problem is: given a list $a_1, \dots, a_n$ of $n$ integers, produce a list $L$ that contains $a_1, \dots, a_n$ in ascending order.

Of course, we can sort more things than just integers, as long as the elements we choose are from some set that has a total order (like, say, strings, assuming your alphabet is ordered).

There are some obvious applications of sorting, and we've already seen some not so obvious applications of sorting, where sorting is a crucial step that allows us to apply some of our other algorithmic design strategies.

With a bit of thinking (i.e. hey smaller elements should be at the beginning of the array!) one can very easily get an acceptable algorithm with a running time of $O(n^2)$. How?

While most people don't consider it this way, insertion sort is a recursive algorithm! This fact relies on the recursive nature of lists. The main operation in insertion sort is inserting an element into an already-sorted list. That is, for an $n$-element list, we insert one element into a list of size $n-1$ that's already sorted. But how do we know that $n-1$-element list is already sorted? Just sort it with insertion sort!

One of the issues with this is that it relies too much on the structure of lists. We observe that in order to insert an element into a sorted list, we could end up going through the entire list, which results in $O(n)$ time. Then doing this for each element as we described gets us easily to $O(n^2)$.

So how do we get to $O(n \log n)$? The key is in avoiding what was just mentioned—relying too heavily on the structure of the data. In particular, we're always forced to consider the list sequentially in this algorithm.

Now, I've already briefly mentioned how we get an $O(n \log n)$ sorting algorithm with heaps, but this is a bit cumbersome. While this certainly gets away from the structure that was holding us back, do we really want to be constructing a heap every time we want to sort a list?

I got a very good question before about what a $O(\log n)$ time algorithm looks like intuitively. The answer to that is that it is basically what happens when you start dividing a problem into subproblems (usually by half, but not necessarily) repeatedly. It is that idea that will guide us for the next few problems.

Mergesort

The idea behind Mergesort is pretty simple:

  1. Divide the array in half
  2. Sort the left half using Mergesort
  3. Sort the right half using Mergesort
  4. Merge the two halves

The only thing that's not clear is what merging does. Assume we have two sorted arrays that we want to merge into one big array containing all the elements in both. What we would need to do is examine each and arrange them in order. Luckily, they're sorted, so all we have to do is go through each array and pick the smaller of the two elements we're looking at to put into our new array next.

Mergesort was developed in 1945 by John von Neumann, who, like many of the most eminent mathematicians and scientists of our time, has scientific contributions in more fields than I can remember or am aware of.

Here, we are given an array $[3,1,5,2,8,7,6,4,9]$ and we sort recursively, splitting the array in half each time. The results of each sort are then merged back together.

The algorithm in pseudocode is essentially the algorithm we described, with the addition of the base case of when your array is just a single element.

    \begin{algorithmic}
    \PROCEDURE{mergesort}{$A$}
        \STATE $n = $ size($A$)
        \IF{$n$ is 1}
            \RETURN{$A$}
        \ENDIF
        \STATE $m = \left\lfloor \frac n 2 \right\rfloor$
        \STATE $B = $ \CALL{mergesort}{$A[1,\dots,m]$}
        \STATE $C = $ \CALL{mergesort}{$A[m+1,\dots,n]$}
        \STATE $A' = $ \CALL{merge}{$B,C$}
    \RETURN{$A'$}
    \ENDPROCEDURE
    \end{algorithmic}

We need to prove that this algorithm is correct. Surprisingly, this is actually not so tricky, because inductive proofs are generally quite simple with recursive algorithms. The only real work is proving the part where we "combine" the solutions. In this case, that is with merge. Assuming we can do that, we have the following.

Mergesort produces a sorted array.

We will prove this by induction on $n$, the size of the array $A$.

For our base case, we have $n = 1$. In this case, Mergesort simply produces the array. Now, consider $n \gt 1$ and assume that Mergesort produces a sorted array for arrays of size less than $n$. In this case, Mergesort divides our array into two arrays, one of size $\lfloor n/2 \rfloor$ and one of size $\rfloor n/2 \rfloor$. We run Mergesort on these two arrays. Since both arrays are of size less than $n$, we know that the result is correct. Since we have two sorted arrays, we can combine them using merge, which will result in a sorted array.

Again, this assumes that we've proved that merge is correct (proving this is pretty good practice). Otherwise, the structure of the proof magically follows the structure of the code. Indeed, for most proofs of correctness, the real work of the proof is ensuring that your "combination" step is correct. Such a proof does not need to follow any particular strategy, since there are lots of ways to "combine" solutions.

Let's consider the efficiency of the algorithm. This is the first recursive algorithm we'll attempt to analyze the efficiency of and it's not necessarily immediately clear how to consider the running time. However, if we think about what the algorithm is really doing, the crucial operation we need to keep track of is comparisons. We'll use this as a proxy of the running time (technically, there are a few more constants floating about if we wanted to be more strict about counting operations).

Suppose $T(n)$ is the number of comparisons that Mergesort performs on an array of size $n$. Then, we have the following pieces of information:

What can we do with this? Let's run through a simplified case first. Suppose $n$ is a power of 2, so $n = 2^k$ for some $k \geq 0$. We can then express $T(n)$ as a recurrence, \[T(n) = \begin{cases} 1 & \text{if $n = 1$,} \\ 2 \cdot T\left(\frac n 2 \right) + n & \text{if $n \gt 1$.} \end{cases} \] This is a nice recurrence. There's a problem, though: what is the asymptotic behaviour of this function?

Recurrences

What we ended up with is a relatively simple recurrence to solve and we have a few ways of looking at it. While there are systematic ways of solving recurrences, these tend to either be not applicable, since our recurrence isn't of the "right" form, or involve very deep mathematics, like connections with generating functions. In the end, the majority of recurrences can be solved using the art of guessing and checking.

This sounds kind of lame, but guessing and checking doesn't have to be a brute-force flailing of the arms. We can use some tricks to make the job easier for us. One such way is to lean on the observation that recursive algorithms behave in ways that produce trees. We can try to gain some insight by expanding our tree of subproblems.

We can then use this to come up with a guess for our recurrence, or if we prefer, we can try iterating the recurrence or guessing the first few values. Here's what we get from iterating: \begin{align*} T(2^k) &= 2 \cdot T(2^{k-1}) + 2^k \\ &= 2 \cdot (2 \cdot T(2^{k-2}) + 2^{k-1}) + 2^k \\ &= 2^2 \cdot T(2^{k-2}) + 2^k + 2^k \\ &= 2^2 \cdot (2 \cdot T(2^{k-3}) + 2^{k-2}) + 2^k + 2^k \\ &= 2^3 \cdot T(2^{k-3}) + 2^k + 2^k + 2^k \\ & \vdots \\ &= 2^j \cdot T(2^{k-j}) + j 2^k \\ & \vdots \\ &= 2^k \cdot T(1) + k 2^k \\ &= (k+1) 2^k \end{align*} We can then take our hypothesis and prove the recurrence formally using induction. Once we've proved it, we notice that this works out very nicely, because notice that $k = \log_2 2^k$ and since $n = 2^k$, we get $T(n) = O(n \log n)$.

Of course, this recurrence works for $n$ which are exact powers of 2. What about all of those numbers that aren't so lucky? If we re-examine our algorithm for Mergesort, we notice that it's possible that we might not get to break our array clean in half, in which case one subarray has size $\lfloor n/2 \rfloor$ and the other has size $\lceil n/2 \rceil$. We would need to rewrite our recurrence as follows \[T(n) \leq \begin{cases} 0 & \text{if $n = 1$,} \\ T\left(\left\lfloor \frac n 2 \right\rfloor\right) + T\left(\left\lceil \frac n 2 \right\rceil \right) + n & \text{if $n \gt 1$.} \end{cases} \]

Now, since we care about upper bounds, this doesn't really matter and we can make things more convenient by smoothing out those annoying details. This recurrence is still sufficient to give us the following result.

$T(n)$ is $O(n \log n)$.

If you are really motivated to prove this with floors and ceilings, it is certainly possible. You would need to play around with the recurrence and somehow guess your way into finding $T(n) \leq n \lceil \log n \rceil$ and then proving that this holds by induction on $n$.

This was a bit of an ad-hoc refresher of how to deal with recurrences. However, if guessing and checking makes you uneasy, we will take a look at a more powerful result that lets us avoid this.

The Master Theorem

So far, the recurrences we've run into are not terribly complicated and so could afford to be pretty ad-hoc with how we've dealt with them. Eventually, we will be running into situations where we need to divide our problem into more than two subproblems, which will result in more complicated recurrences. Fortunately, there is a bit of a reliable result that covers most of the types of recurrences we're likely to see.

In algorithms analysis, the form of recurrence we tend to see is the following.

A divide-and-conquer recurrence is a recurrence of the form \[T(n) = a T\left(\frac n b\right) + f(n)\] where $a$ is the number of subproblems, $b$ is the factor by which we decrease the size of the subproblem, and $f(n)$ is the cost of combining solutions to subproblems.

We can gain some intuition about how to work with this by looking at its associated recursion tree. For convenience, we assume that $n$ is a power of $b$. Then at each level, each problem breaks into $a$ subproblems. What this means is that on the $i$th level of the tree, we have a total of $a^i$ subproblems, each of size $\frac{n}{b^i}$. In total, we will have $1 + \log_b n$ levels. Now, suppose also that we have $f(n) = n^c$ for some fixed constant $c$ (please ignore the fact that the illustration says $k$). We have the following.

Adding all of these up, we have \[T(n) = \sum_{i=0}^{\log_b n} a^i \left( \frac{n}{b^i} \right)^c = n^c \sum_{i=0}^{\log_b n} \left(\frac{a}{b^c}\right)^i.\]

We would like to know what the asymptotic behaviour of this recurrence is. This will depend on what $a,b,c$ are, which will determine which term of $T(n)$ dominates.

The question is how fast the sum grows. Since this is a geometric sum, this will depend on what $a$ and $b$ are. Recall the following:

Taking $r = \frac{a}{b^c}$ and $k = \log_b n$ together with the fact that $\frac{a}{b^c} \lt 1$ if and only if $c \gt \log_b a$, we arrive at the following.

Let $a \geq 1, b \geq 2, c \geq 0$ and let $T:\mathbb N \to \mathbb R^+$ be defined by \[T(n) = aT\left(\frac n b\right) + \Theta(n^c), \] with $T(0) = 0$ and $T(1) = \Theta(1)$. Then, \[ T(n) = \begin{cases} \Theta(n^{\log_b a}) & \text{if $c \lt \log_b a$}, \\ \Theta(n^c \log n) & \text{if $c = \log_b a$}, \\ \Theta(n^c) & \text{if $c \gt \log_b a$}. \\ \end{cases} \]

This is called the Master Theorem in CLRS because it magically solves many of the divide and conquer recurrences we see and because most instructors don't really get into the proof. Observe that I skillfully avoided many details. But the origins of this particular recurrence and its solutions is due to Bentley, Haken, and Saxe in 1980, partly intended for use in algorithms classes. Notice that this theorem doesn't actually solve the recurrence—we don't get a nice closed form out of it. Instead, we get the asymptotic behaviour of the recurrence, without needing a solution. Luckily, that's all we really care about. If you are really interested in the entirety of the proof, then you should take a look at CLRS 4.6.

Recall our recurrence from Mergesort. Let $T(n) = 2T\left(\frac n 2\right) + kn$, for some constant $k$. Here, we have $a = 2$, $b = 2$, and $c = 1$. Since $\log_2 2 = 1$, we have that $T(n)$ is $O(n \log n)$.

Now consider something like $T(n) = 3T\left(\frac n 2\right) + kn$. Here, we have $1 \lt \log_2 3 \lt 1.58$, so this gives us $T(n) = O(n^{\log_2 3}) = O(n^{1.58})$, a noticeable difference just by increasing the number of subproblems by 1.