MPCS 55001 — Lecture 7

The Master Theorem

So far, the recurrences we've run into are not terribly complicated and so could afford to be pretty ad-hoc with how we've dealt with them. Eventually, we will be running into situations where we need to divide our problem into more than two subproblems, which will result in more complicated recurrences. Fortunately, there is a bit of a reliable result that covers most of the types of recurrences we're likely to see.

In algorithms analysis, the form of recurrence we tend to see is the following.

Definition 7.1. A divide-and-conquer recurrence is a recurrence of the form \[T(n) = a T\left(\frac n b\right) + f(n)\] where $a$ is the number of subproblems, $b$ is the factor by which we decrease the size of the subproblem, and $f(n)$ is the cost of combining solutions to subproblems.

We can gain some intuition about how to work with this by looking at its associated recursion tree. For convenience, we assume that $n$ is a power of $b$. Then at each level, each problem breaks into $a$ subproblems. What this means is that on the $i$th level of the tree, we have a total of $a^i$ subproblems, each of size $\frac{n}{b^i}$. In total, we will have $1 + \log_b n$ levels. Now, suppose also that we have $f(n) = n^c$ for some fixed constant $c$ (please ignore the fact that the illustration says $k$). We have the following.

Adding all of these up, we have \[T(n) = \sum_{i=0}^{\log_b n} a^i \left( \frac{n}{b^i} \right)^c = n^c \sum_{i=0}^{\log_b n} \left(\frac{a}{b^c}\right)^i.\]

We would like to know what the asymptotic behaviour of this recurrence is. This will depend on what $a,b,c$ are, which will determine which term of $T(n)$ dominates.

If the $n^c$ term dominates, then that means most of the work happens near the root.
If the sum dominates, then this means that most of the cost is at the base case/leaf level.
The other possiblity is that the costs are evenly distributed.

The question is how fast the sum grows. Since this is a geometric sum, this will depend on what $a$ and $b$ are. Recall the following:

If $0 \lt r \lt 1$, then $1 + r + r^2 + \cdots + r^k \leq \frac{1}{1-r}$.
If $r = 1$, then $1 + r + r^2 + \cdots + r^k = k+1$.
If $r \gt 1$, then $1 + r + r^2 + \cdots + r^k = \frac{r^{k+1}-1}{r-1}$.

Taking $r = \frac{a}{b^c}$ and $k = \log_b n$ together with the fact that $\frac{a}{b^c} \lt 1$ if and only if $c \gt \log_b a$, we arrive at the following.

Theorem 7.2. Let $a \geq 1, b \geq 2, c \gt 0$ and let $T:\mathbb N \to \mathbb R^+$ be defined by \[T(n) = aT\left(\frac n b\right) + \Theta(n^c), \] with $T(0) = 0$ and $T(1) = \Theta(1)$. Then, \[ T(n) = \begin{cases} \Theta(n^{\log_b a}) & \text{if $c \lt \log_b a$}, \\ \Theta(n^c \log n) & \text{if $c = \log_b a$}, \\ \Theta(n^c) & \text{if $c \gt \log_b a$}. \\ \end{cases} \]

This is called the Master Theorem in CLRS because it magically solves many of the divide and conquer recurrences we see and because most instructors don't really get into the proof. Observe that I skillfully avoided many details. But the origins of this particular recurrence and its solutions is due to Bentley, Haken, and Saxe in 1980, partly intended for use in algorithms classes. If you are really interested in the entirety of the proof, then you should take a look at CLRS 4.6.

Example 7.3. Recall our recurrences from Mergesort and Closest Pair. Let $T(n) = 2T\left(\frac n 2\right) + kn$, for some constant $k$. Here, we have $a = 2$, $b = 2$, and $c = 1$. Since $\log_2 2 = 1$, we have that $T(n)$ is $O(n \log n)$.

Now consider something like $T(n) = 3T\left(\frac n 2\right) + kn$. Here, we have $1 \lt \log_2 3 \lt 1.58$, so this gives us $T(n) = O(n^{\log_2 3}) = O(n^{1.58})$, a noticeable difference just by increasing the number of subproblems by 1.

Multiplication

Integer multiplication is another one of the classical results that uses this technique of division into subproblems and it's one of the cases where our usual assumptions about unit cost operations won't fly. First, let's consider how we'd add two $n$-bit integers.

Obviously, in order for our analysis to be of any use, we can't use integer addition as a unit cost operation. We will instead consider operations at the bit level to be unit cost operations. So bit addition is a basic operation and adding two $n$-bit numbers requires $n$ bit operations. We can conclude that we need $\Theta(n)$ bit operations to add $n$-bit integers, since the worst case is having to perform $n$ bit operations and in even the best case, we have to use $n$ bit operations.

Then how to do we apply this to multiplication? Again, we are measuring in terms of bit operations, which include bit addition and bit shifts (multiplication by a power of 2).

Example 7.4. The obvious approach is to use our hand pen and paper algorithm that we learned in school.

A quick look at this gives us a $O(n^2)$ algorithm: we perform $n$ bit operations for each digit of the bottom number, which has $n$ digits itself. This was conjectured to be optimal by Kolmogorov in 1956 and it's definitely hard to see how we could possibly improve this. After all, it seems like we would need to look through every digit to perform the multiplications.

Fast integer multiplication

However, in 1960, Karatsuba was able to show that we can do better than $O(n^2)$. As he recounts, he encountered the $n^2$ conjecture in Kolmogorov's seminar on mathematical problems in computer science and came up with a better algorithm in about a week.

Consider two $n$-bit numbers $M$ and $N$. We write $M = ab$ and $N = cd$, where $a,b,c,d$ are $\frac n 2$-bit strings. In other words, $a$ is the left half of $M$ and $b$ the second half, and we do the same for $c$ and $d$ with $N$. Then we observe the following. \begin{align*} MN &= (a \cdot 2^{\frac n 2} + b)(c \cdot 2^{\frac n 2} + d) \\ &= ac2^n + (ad+bc) 2^{\frac n 2} +bd \end{align*}

Notice that now, we are performing four multiplications of $\frac n 2$-bit numbers. Let's figure out if this is actually better or not. This gives us the recurrence \[T(n) = 4 T\left(\frac n 2 \right) + O(n),\] since we split our problem into four subproblems of size $\frac n 2$, followed by $O(n)$ bit additions. If we use the master method, we get that $T(n) = O(n^{\log_2 4}) = O(n^2)$, so this isn't actually any better than what we started out with. But this wasn't Karatsuba's result.

The one weird trick ("Kolmogorov was very agitated") that Karatsuba employed was observing that \[ad+bc = ac+bd - (a-b)(c-d).\] Now, this looks more complicated, but the crucial observation is that we've already computed $ac$ and $bd$! This means that the only new multiplication we need to perform is $(a-c)(b-d)$. We then get the recurrence \[T(n) = 3 T\left(\frac n 2 \right) + O(n),\] which gets us $O(n^{\log_2 3}) = O(n^{1.585})$, a significant improvement.

The algorithm looks like this.

    \begin{algorithm}
    \caption{Karatsuba's multiplication algorithm}
    \begin{algorithmic}
    \PROCEDURE{multiply}{$M,N,n$}
        \IF{$n = 1$}
            \RETURN{$M \times N$}
        \ELSE
            \STATE $m = \left\lceil \frac n 2 \right\rfloor$
            \STATE $a = \left\lfloor \frac M {2^m} \right\rfloor$
            \STATE $b = M \bmod 2^m$
            \STATE $c = \left\lfloor \frac N {2^m} \right\rfloor$
            \STATE $d = N \bmod 2^m$
            \STATE $e = $ multiply($a,c,m$)
            \STATE $f = $ multiply($a,c,m$)
            \STATE $g = $ multiply($|a-b|,|c-d|,m$)
        \RETURN{$2^{2m}e + 2^m (e+f-g) + f$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

Proposition 7.6. Let $M,N$ be two $n$-bit integers. Then $M \times N$ can be computed using $O(n^{\log_2 3}) = O(n^{1.585})$ bit operations.

Proof. Let $T(n)$ be the number of bit operations that are performed by Algorithm 7.?. We have $T(1) = 1$. Then for $n \gt 1$, we observe that lines 5 through 9 can each be computed using $m$ bit shifts, which amounts to $O(n)$ operations in total. Each call of multiply in lines 10-12 cost $T\left(\left\lfloor \frac n 2 \right\rfloor\right)$ bit operations, while computing $|a-b|$ and $|c-d|$ in line 12 require $n$ bit operations. In total, we arrive at $T(n) = 3T\left(\left\lfloor \frac n 2 \right\rfloor\right) + O(n)$, which we have previously shown is $O(n^{\log_2 3}) = O(n^{1.58})$ by the Master theorem. $$\tag*{$\Box$}$$

Karatsuba's algorithm is the first of a number of fast integer multiplication algorithms that have since been developed and is a great example of how . Now, granted, people weren't exactly clamouring for a fast multiplication algorithm before computers were invented, so there was no real utility to this kind of thing.

However, now that it was clear that $n^2$ isn't the best we can do, the question of just how low we can go was now wide open. This kind of question leads to two main types of approaches:

Can we come up with a faster algorithm? This is a question of algorithm design and analysis and is what this course is about.
Can we show that no faster algorithm is possible? This is a question of computational complexity. This is the kind of conjecture that Kolmogorov was making and is what complexity theory is about.

In his 1966 PhD Thesis, Cook developed an algorithm, based on an earlier 1963 algorithm of Toom, that is essentially a generalization of Karatsuba's. Very roughly speaking, it takes the idea of splitting inputs into a bunch of subproblems and combines them even further than Karatsuba does and is parameterized by the number of splits. For the next level up from Karatsuba's algorithm, Toom-Cook splits integers into three instead of two. Ordinarily, this leads to nine multiplications, but their algorithm reduces it to five, giving a running time of $O(n^{\log_3 5}) = O(n^{1.45})$. This gets better asymptotically as you split your integers into more pieces but the cost of splitting and combining becomes so large that it becomes impractical for small (most) values of $n$.

In 1971, Schönhage and Strassen gave an algorithm with running time $O(n \log n \cdot \log \log n)$. It's hard to see once you start dealing with these kinds of functions, but Schönhage-Strassen is asympotically better than Toom-Cook once you get to very large $n$ and this remained the best for over thirty years. Schönhage and Strassen also conjectured a lower bound for integer multiplication: $\Omega(n \log n)$.

The next development came in 2007, when Fürer gave an algorithm with $O(n \log n \cdot 2^{O(\log^* n)})$ time. At first, this looks bad, but $\log^* n$ is the iterated logarithm and is another one of those very slowly growing functions like $\alpha(n)$. Here, we need something like $n \gt 2^{2^{16}}$ for $\log^* n \gt 5$, which is to say that this algorithm is an improvement.

This kicked off another series of developments which culminated in an algorithm published last year, in March 2019, by Harvey and van der Hoeven, which solves integer multiplication in $O(n \log n)$ time. There is still the question of whether this is really the lower bound, but it's a significant achievement, and is a long way from the original conjecture of $\Theta(n^2)$.

For more grounded individuals, there is also the question of practicality. None of the algorithms after Schönhage-Strassen are acutally practical, in that they really only perform asymptotically better on large enough inputs, and those sizes are impractically large. This is a common outcome in algorithms research: someone proves a theoretical bound, which isn't useful in practice.

That doesn't mean that these results aren't important, even practically speaking. If we look at this trail of algorithms that we have and look at how many of the algorithms are actually in use, we get a slightly surprising answer: a lot of them.

For this particular problem, the quirk that a lot of these algorithms do better for sufficiently large $n$ means that often the fastest way to multiply two numbers involves choosing the right algorithm based on the size of the numbers. So modern mathematical libraries will implement everything from grade school multiplication up to Schönhage-Strassen and run the right algorithm for the right input.

Fast matrix multiplication

Closely related to integer multiplication is matrix multiplication. Just like integer multiplication, matrix multiplication is a basic operation in many applications. And just like integer multiplication, a lot of the same ideas in cleverly decomposing inputs into smaller subproblems can be applied here.

In case we've forgotten all of our linear algebra, here is a refresher.

Definition 7.7. The dot product of two length $n$ vectors $x,y \in \mathbb R^n$ is \[x \cdot y = x_1 \cdot y_1 + x_2 \cdot y_2 + \cdots + x_n \cdot y_n.\] From the definition, it's clear that computing the dot product of two length $n$ vectors requires $n$ integer operations (additions and multiplications).

Example 7.8. Let $x = [3,1,5]$ and $y = [7,6,5]$. Then \[x \cdot y = 3 \cdot 7 + 1 \cdot 6 + 5 \cdot 5 = 21 + 6 + 25 = 52.\]

We can then take this idea and generalize it to matrices.

Definition 7.9. Given two $n \times n$ matrices $A,B$, the matrix $C = AB$ is defined by entries \[c_{ij} = a_i \cdot b_j = \sum_{k = 1}^n a_{ik} b_{kj}, \] where $c_{ij}$ is the entry of $C$ in the $i$th row and $j$th column, $a_i$ is the $i$th row vector of $A$, and $b_j$ is the $j$th column vector of $B$.

Example 7.10. Consider the following matrices, \[A = \begin{bmatrix} 3 & 4 & 6 \\ 2 & 8 & 3 \\ 7 & 6 & 5 \end{bmatrix}, B = \begin{bmatrix} 3 & 8 & 9 \\ 1 & 7 & 6 \\ 5 & 6 & 1 \end{bmatrix}.\] Then if $C = AB$, we have \[C = \begin{bmatrix} 3 \cdot 3 + 4 \cdot 1 + 6 \cdot 5 & 3 \cdot 8 + 4 \cdot 7 + 6 \cdot 6 & 3 \cdot 9 + 4 \cdot 6 + 6 \cdot 1 \\ 2 \cdot 3 + 8 \cdot 1 + 3 \cdot 5 & 2 \cdot 8 + 8 \cdot 7 + 3 \cdot 6 & 2 \cdot 9 + 8 \cdot 6 + 3 \cdot 1 \\ 7 \cdot 3 + 6 \cdot 1 + 5 \cdot 5 & 7 \cdot 8 + 6 \cdot 7 + 5 \cdot 6 & 7 \cdot 9 + 6 \cdot 6 + 5 \cdot 1 \\ \end{bmatrix} = \begin{bmatrix} 43 & 88 & 57 \\ 29 & 90 & 69 \\ 52 & 128 & 104 \end{bmatrix}.\]

A quick and dirty analysis nets us $O(n^3)$ operations: for each entry $c_{ij}$, we're performing a dot product of two length $n$ vectors, which we saw was $O(n)$ operations, and there are $n^2$ entries in the matrix. Here, we have a very similar situation to integer multiplication, where the question of whether the obvious matrix multiplication algorithm is really the best we can do. But in fact, it's maybe even more obvious that there should be clever ways to decompose matrix products, since that's something we do in linear algebra.

Example 7.11. The most obvious way of doing this follows from integer multiplication, where we divide each of our $n \times n$ matrices into $\frac n 2 \times \frac n 2$ matrices. Assume we have $n = 2^k$ so we can split it evenly. Then $C = AB$ can be expressed as \[\begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} \] where $X_{ij}$ are $\frac n 2 \times \frac n 2$ matrices. This gives a natural deconposition where we compute \[C_{ij} = A_{i1} B_{1j} + A_{i2}{2j}.\] If you think that's either too good to be true or that it looks suspiciously like we haven't really done anything, your hunch is correct. We have divided our matrix into quadrants, where computing each quadrant is the sum of two matrix products of $\frac n 2 \times \frac n 2$ matrices. This means we have a total of eight different matrix multiplications, along with additions.

This gives us the recurrence \[T(n) = 8 T\left( \frac n 2 \right) + O(n^2),\] which if we apply the master theorem gives us $O(n^{\log_2 8}) = O(n^3)$. So again, the obvious decomposition is not quite clever enough to make an improvement.

And so like Karatsuba, we need something a bit cleverer. This trick is due to Strassen in 1968, who we've already encountered from Schönhage-Strassen. The idea is similar to Karatsuba's, and eliminates one set of matrix multiplications.

Proposition 7.12. The matrix product of two $n \times n$ matrices can be computed in time $O(n^{2.81})$.

Proof. We assume that $n$ is a power of 2. Suppose $n$ is not a power of 2. Then we can pad matrices with 0 entries and proceed. Let \[\begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}\] and let

$P_1 = A_{11}(B_{12} - B_{22})$
$P_2 = (A_{11} + A_{12})B_{22}$
$P_3 = (A_{21} + A_{22})B_{11}$
$P_4 = A_{22}(B_{21} - B_{11})$
$P_5 = (A_{11} + A_{22})(B_{11} + B_{22})$
$P_6 = (A_{12} - A_{22})(B_{21} + B_{22})$
$P_7 = (A_{11} - A_{21})(B_{11} + B_{12})$

Then

$C_{11} = P_5 + P_4 - P_2 + P_6$
$C_{12} = P_1 + P_2$
$C_{21} = P_3 + P_4$
$C_{22} = P_1 + P_5 - P_3 + P_7$

Observe that this requires 7 matrix multiplications of matrices of size $\frac n 2 \times \frac n 2$ and a series of matrix additions, with total cost $O(n^2)$. This gives us the recurrence \[T(n) = 7T\left(\frac n 2\right) + O(n^2).\] By the Master theorem, we arrive at $T(n) = O(n^{\log_2 7}) = O(n^{2.81})$. $$\tag*{$\Box$}$$

Since Strassen's name appears in both, you could imagine that the history of matrix multiplication algorithms tracks similarly to integer multiplication algorithms. Strassen's result kicked off a run of improvements to matrix multiplication, using far more complicated mathematics than splitting matrices up, which ended in 1989 with Coppersmith and Winograd getting matrix multiplication down to $O(n^{2.376})$.

This algorithm remains the best known algorithm for matrix multiplication, but recent work has improved the analysis of the algorithm. In 2010, about twenty years later, Stothers gave an improvement to $O(n^{2.3737})$ in his PhD thesis and this was followed by Williams in 2011 with $O(n^{2.3729)}$ and Le Gall in 2014 with $O(n^{2.37287})$.

Unlike integer multiplication, no one has yet achieved a "nice" bound for matrix multiplication, nor are any algorithms after Strassen actually practical to implement. The question of whether $\Omega(n^2)$ is possible remains open.