MPCS 55001 — Lecture 7

Let's look at a neat problem that appears harder than it looks at first (well, for computers, it still might take a bit of work for us). First, an important definition.

The Euclidean distance between two points $p_1 = (x_1,y_1)$ and $p_2 = (x_2,y_2)$ is defined by \[d(p_1,p_2) = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}.\]

The closest points on a plane problem is: Given a list of $n$ points on a plane, find the closest pair of points.

This is a fairly simple problem that is another one of those things that people probably really want to know about and efficiently, since it's a very basic computational geometry problem and has obvious applications in all sorts of graphics and computer vision problems. It was first studied by Shamos and Hoey in 1975, where they showed the algorithm we're about to investigate.

We will assume that squaring and square roots are basic operations (although we don't really need to take square roots). There is a pretty obvious brute force algorithm: just compute and check the distance of every pair of points. Of course, if you have $n$ points, that's $O(n^2)$ things you have to check.

Designing the algorithm

So we want to try to use divide and conquer on this set, ideally splitting our problem up into subproblems of roughly equal size. It's not exactly clear how to do this, which can lead to something complicated like splitting our space into quadrants or something.

Luckily, we do not need to go that far. Something simpler we can do is to just split our problem into two. First, we sort our points by their $x$-coordinate and then divide our points accordingly. Picture a vertical line $L$ going down the middle of the space. Then we recursively find the closest pair in each half and choose the closer of the two. Once we get down to three points in our space, we can just compute which pair is closest. Easy! But there is a big complication with this strategy:

What happens if the closest pair is actually right in the middle, with one point to the left and one point to the right?

Surprisingly, all is not lost. Let $\delta$ be the smaller of the distances computed for the left closest pair and the right closest pair. If we run into the edge case that we described, we know that this pair can't be more than $\delta$ away from each other!

What we can do is consider only those points that are at most $\delta$ away from the line $L$. Sort these by their $y$-coordinates. The next question is: how many comparisons do we have to make when considering these points? It seems like we could be looking at arbitrarily many such comparisons, but we can do a bit more narrowing down.

Let $S = \{p_1, \dots, p_k\}$ be the set of points within distance $\delta$ of line $L$, sorted in order of $y$-coordinate. Then, for each point $p_i$, if $p_j$ with $|i - j| \gt 7$, then $d(p_i,p_j) \gt \delta$.

Consider a rectangle $R$ of width $2\delta$ and height $\delta$ centered on $L$. We claim that this rectangle can contain at most 8 points. If it contains more, then there are enough points on one side of $L$ so that there is at least one pair of points with distance closer than $\delta$, which would contradict our assumption that $\delta$ was the distance of the closest pair of points on one side.

To see this, we divide $R$ into eight squares of length and width $\frac \delta 2$. We claim that each square can contain at most one point. Observe that the furthest apart any two points can be in such a square is $\frac{\delta}{\sqrt 2} \lt \delta$.

Therefore, for any point within the $\delta$-region of $L$, it suffices to compare it with at most seven other points, since all other points are guaranteed to have distance greater than $\delta$. The important thing about this is we only need to compare it with a constant number of points (it is possible to improve the constant from seven by employing more complicated geometric arguments). So even if we end up with an unusually large number of points along this strip, we know that at worst, we need something on the order of $O(n)$ comparisons, which crucially doesn't take us out of the realm of $O(n \log n)$.

This is enough to prove that our algorithm is correct. Just like with Mergesort, this involves first proving the combining step is true (which we just did above) and then proving that the algorithm itself is correct via an inductive argument. In this case, the base case would be when we need to determine the closest pair of 3 points.

Efficiency and Implementation

An efficient implementation almost follows from what we've just described. The only complication occurs in the combining phase, when we try to examine points along the middle strip. As we've described the algorithm, we would require sorting points by their $y$-coordinate each time we want to combine solutions. This leads to an $O(n \log n)$ step at each recursive call, so we need to avoid this.

It turns out the trick is in preprocessing and keeping two copies of the points: one sorted by $x$-coordinate and one sorted by $y$-coordinate. Once we do this, we can just extract the points we need in order by referring to these two lists. This then gets us the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{closest-pair}{$S = \{p_1 = (x_1, y_1), \dots, p_n = (x_n,y_n)\}$}
        \STATE $S_x = $ $S$ sorted by $x$-coordinate
        \STATE $S_y = $ $S$ sorted by $y$-coordinate
        \RETURN{\CALL{closest-pair-rec}{$S_x,S_y$}}
    \ENDPROCEDURE

    \PROCEDURE{closest-pair-rec}{$S_x, S_y$}
        \IF{$n \leq 3$}
            \STATE Compute closest pair
            \RETURN{Closest pair}
        \ENDIF

        \STATE Construct $L_x,L_y,R_x,R_y$
        \STATE $(\ell_1,\ell_2) = $ \CALL{closest-pair-rec}{$L_x,L_y$}
        \STATE $(r_1,r_2) = $ \CALL{closest-pair-rec}{$R_x,R_y$}

        \STATE $\delta = \min(d(\ell_1,\ell_2),d(r_1,r_2))$

        \STATE $L = \max_{(x,y) \in L_x} x$
        \STATE $M = \{(x,y) \in S_y \mid |x - L| \leq \delta\}$, sorted by $y$-coordinate

        \FORALL{$p \in M$}
            \STATE Compute $d(s,s')$ for next 7 points $s' \in M$
        \ENDFOR
        \STATE Let $s,s'$ be the pair of minimum distance computed among points of $M$
        
        \IF{$d(s,s') \lt \delta$}
            \RETURN{$(s,s')$}
        \ELIF{$d(\ell_1,\ell_2) \lt d(r_1,r_2)$}
            \RETURN{$(\ell_1,\ell_2)$}
        \ELSE
            \RETURN{$(r_1,r_2)$}
        \ENDIF

    \ENDPROCEDURE
    \end{algorithmic}

This algorithm is a bit more complicated than ones we've seen before, so this proof will be a bit more exhaustive.

First, the sorting steps of lines 2 and 3 can be done in $O(n \log n)$ time. We then need to analyze the running time for Closest-Pair-Rec. Let $T(n)$ be the number of operations performed by Closest-Pair-Rec. When $n \leq 3$, we have $T(n) = O(1)$.

Now, consider when $n \gt 3$.

In line 9, we construct lists of points for the left and right halves. This can be done by scanning the lists $S_x$ and $S_y$, which has time $O(n)$.
Lines 11 and 12 are recursive calls to Closest-Pair-Rec on inputs of size $\lfloor n/2 \rfloor$ and $\lceil n/2 \rceil$. This gives $T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil)$ operations.
Lines 12 and 13 are computations which require $O(1)$ time.
Line 14 is the construction of the list of points close to the middle, in order of $y$-coordinate. This is constructed by scanning the list $S_y$ and computing the distance of each point from $L$, which can be done in $O(n)$ time.
Lines 15 and 16 loop through $M$ and for each point, performs a constant number of conmputations, for a total of $O(n)$ time.
The rest of the algorithm are constant time operations.

Therefore, we have the recurrence \[T(n) = \begin{cases} O(1) & \text{if $n \leq 3$}, \\ T(\lfloor n/2 \rfloor) + T(\rceil n/2 \rceil) + O(n) & \text{if $n \gt 3$.} \end{cases}\] and we have that $T(n) = O(n \log n)$.

Then the running time of Closest-Pair is $O(n \log n) + T(n) = O(n \log n)$.

A formal standalone proof of this would involve proving the recurrence for $T(n)$. Of course, this is the same recurrence as Mergesort, so that's not too hard.

Both Mergesort and Closest Pairs nicely follow the same template for divide and conquer algorithms:

Proving correctness and efficiency then involves proving the efficiency and correctness of these steps. Proving the efficiency of a divide and conquer algorithm involves solving the recurrence based on the recursive steps and the combination step. Proving correctness involves proving the correctness of the combination step and inductively proving the correctness of the recursive steps.

This is not directly related to divide and conquer algorithms, but it turns out that sorting a bunch of points is a solid first step for many computational geometry algorithms. For instance, the convex hull problem asks for a minimal convex polygon that encloses a set of points. This sounds complicated, but it's no more difficult than sorting! Once you have the convex hull of a set of points, you get the farthest pair of points very easily: just check pairs of points on the convex hull!

Multiplication

Integer multiplication is another one of the classical results that uses this technique of division into subproblems and it's one of the cases where our usual assumptions about unit cost operations won't fly. First, let's consider how we'd add two $n$-bit integers.

Obviously, in order for our analysis to be of any use, we can't use integer addition as a unit cost operation. We will instead consider operations at the bit level to be unit cost operations. So bit addition is a basic operation and adding two $n$-bit numbers requires $n$ bit operations. We can conclude that we need $\Theta(n)$ bit operations to add $n$-bit integers, since the worst case is having to perform $n$ bit operations and in even the best case, we have to use $n$ bit operations.

Then how to do we apply this to multiplication? Again, we are measuring in terms of bit operations, which include bit addition and bit shifts (multiplication by a power of 2).

The obvious approach is to use our hand pen and paper algorithm that we learned in school.

A quick look at this gives us a $O(n^2)$ algorithm: we perform $n$ bit operations for each digit of the bottom number, which has $n$ digits itself. This was conjectured to be optimal by Kolmogorov in 1956 and it's definitely hard to see how we could possibly improve this. After all, it seems like we would need to look through every digit to perform the multiplications.

Fast integer multiplication

However, in 1960, Karatsuba was able to show that we can do better than $O(n^2)$. As he recounts, he encountered the $n^2$ conjecture in Kolmogorov's seminar on mathematical problems in computer science and came up with a better algorithm in about a week.

Consider two $n$-bit numbers $M$ and $N$. We write $M = ab$ and $N = cd$, where $a,b,c,d$ are $\frac n 2$-bit strings. In other words, $a$ is the left half of $M$ and $b$ the second half, and we do the same for $c$ and $d$ with $N$. Then we observe the following. \begin{align*} MN &= (a \cdot 2^{\frac n 2} + b)(c \cdot 2^{\frac n 2} + d) \\ &= ac2^n + (ad+bc) 2^{\frac n 2} +bd \end{align*}

Notice that now, we are performing four multiplications of $\frac n 2$-bit numbers. Let's figure out if this is actually better or not. This gives us the recurrence \[T(n) = 4 T\left(\frac n 2 \right) + O(n),\] since we split our problem into four subproblems of size $\frac n 2$, followed by $O(n)$ bit additions. If we use the master method, we get that $T(n) = O(n^{\log_2 4}) = O(n^2)$, so this isn't actually any better than what we started out with. But this wasn't Karatsuba's result.

The one weird trick ("Kolmogorov was very agitated") that Karatsuba employed was observing that \[ad+bc = ac+bd - (a-b)(c-d).\] Now, this looks more complicated, but the crucial observation is that we've already computed $ac$ and $bd$! This means that the only new multiplication we need to perform is $(a-c)(b-d)$. We then get the recurrence \[T(n) = 3 T\left(\frac n 2 \right) + O(n),\] which gets us $O(n^{\log_2 3}) = O(n^{1.585})$, a significant improvement.

    \begin{algorithmic}
    \PROCEDURE{multiply}{$M,N,n$}
        \IF{$n = 1$}
            \RETURN{$M \times N$}
        \ELSE
            \STATE $m = \left\lceil \frac n 2 \right\rfloor$
            \STATE $a = \left\lfloor \frac M {2^m} \right\rfloor$
            \STATE $b = M \bmod 2^m$
            \STATE $c = \left\lfloor \frac N {2^m} \right\rfloor$
            \STATE $d = N \bmod 2^m$
            \STATE $e = $ \CALL{multiply}{$a,c,m$}
            \STATE $f = $ \CALL{multiply}{ $a,c,m$ }
            \STATE $g = $ \CALL{multiply}{$|a-b|,|c-d|,m$}
        \RETURN{$2^{2m}e + 2^m (e+f-g) + f$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

Let $M,N$ be two $n$-bit integers. Then $M \times N$ can be computed using $O(n^{\log_2 3}) = O(n^{1.585})$ bit operations.

Let $T(n)$ be the number of bit operations that are performed by Algorithm 7.?. We have $T(1) = 1$. Then for $n \gt 1$, we observe that lines 5 through 9 can each be computed using $m$ bit shifts, which amounts to $O(n)$ operations in total. Each call of multiply in lines 10-12 cost $T\left(\frac n 2 \right)$ bit operations, while computing $|a-b|$ and $|c-d|$ in line 12 require $n$ bit operations. In total, we arrive at $T(n) = 3T\left(\frac n 2 \right) + O(n)$, which we have previously shown is $O(n^{\log_2 3}) = O(n^{1.58})$ by the Master theorem.

Karatsuba's algorithm is the first of a number of fast integer multiplication algorithms that have since been developed and is a great example of how . Now, granted, people weren't exactly clamouring for a fast multiplication algorithm before computers were invented, so there was no real utility to this kind of thing.

However, now that it was clear that $n^2$ isn't the best we can do, the question of just how low we can go was now wide open. This kind of question leads to two main types of approaches:

In his 1966 PhD Thesis, Cook developed an algorithm, based on an earlier 1963 algorithm of Toom, that is essentially a generalization of Karatsuba's. Very roughly speaking, it takes the idea of splitting inputs into a bunch of subproblems and combines them even further than Karatsuba does and is parameterized by the number of splits. For the next level up from Karatsuba's algorithm, Toom-Cook splits integers into three instead of two. Ordinarily, this leads to nine multiplications, but their algorithm reduces it to five, giving a running time of $O(n^{\log_3 5}) = O(n^{1.45})$. This gets better asymptotically as you split your integers into more pieces but the cost of splitting and combining becomes so large that it becomes impractical for small (most) values of $n$.

In 1971, Schönhage and Strassen gave an algorithm with running time $O(n \log n \cdot \log \log n)$. It's hard to see once you start dealing with these kinds of functions, but Schönhage-Strassen is asympotically better than Toom-Cook once you get to very large $n$ and this remained the best for over thirty years. Schönhage and Strassen also conjectured a lower bound for integer multiplication: $\Omega(n \log n)$.

The next development came in 2007, when Fürer gave an algorithm with $O(n \log n \cdot 2^{O(\log^* n)})$ time. At first, this looks bad, but $\log^* n$ is the iterated logarithm and is another one of those very slowly growing functions like $\alpha(n)$. Here, we need something like $n \gt 2^{2^{16}}$ for $\log^* n \gt 5$, which is to say that this algorithm is an improvement.

This kicked off another series of developments which culminated in an algorithm published last year, in March 2019, by Harvey and van der Hoeven, which solves integer multiplication in $O(n \log n)$ time. There is still the question of whether this is really the lower bound, but it's a significant achievement, and is a long way from the original conjecture of $\Theta(n^2)$.

For more grounded individuals, there is also the question of practicality. None of the algorithms after Schönhage-Strassen are acutally practical, in that they really only perform asymptotically better on large enough inputs, and those sizes are impractically large. This is a common outcome in algorithms research: someone proves a theoretical bound, which isn't useful in practice.

That doesn't mean that these results aren't important, even practically speaking. If we look at this trail of algorithms that we have and look at how many of the algorithms are actually in use, we get a slightly surprising answer: a lot of them.

For this particular problem, the quirk that a lot of these algorithms do better for sufficiently large $n$ means that often the fastest way to multiply two numbers involves choosing the right algorithm based on the size of the numbers. So modern mathematical libraries will implement everything from grade school multiplication up to Schönhage-Strassen and run the right algorithm for the right input.

Fast matrix multiplication

Closely related to integer multiplication is matrix multiplication. Just like integer multiplication, matrix multiplication is a basic operation in many applications. And just like integer multiplication, a lot of the same ideas in cleverly decomposing inputs into smaller subproblems can be applied here.

The dot product of two length $n$ vectors $x,y \in \mathbb R^n$ is \[x \cdot y = x_1 \cdot y_1 + x_2 \cdot y_2 + \cdots + x_n \cdot y_n.\] From the definition, it's clear that computing the dot product of two length $n$ vectors requires $n$ integer operations (additions and multiplications).

Let $x = [3,1,5]$ and $y = [7,6,5]$. Then \[x \cdot y = 3 \cdot 7 + 1 \cdot 6 + 5 \cdot 5 = 21 + 6 + 25 = 52.\]

Given two $n \times n$ matrices $A,B$, the matrix $C = AB$ is defined by entries \[c_{ij} = a_i \cdot b_j = \sum_{k = 1}^n a_{ik} b_{kj}, \] where $c_{ij}$ is the entry of $C$ in the $i$th row and $j$th column, $a_i$ is the $i$th row vector of $A$, and $b_j$ is the $j$th column vector of $B$.

Consider the following matrices, \[A = \begin{bmatrix} 3 & 4 & 6 \\ 2 & 8 & 3 \\ 7 & 6 & 5 \end{bmatrix}, B = \begin{bmatrix} 3 & 8 & 9 \\ 1 & 7 & 6 \\ 5 & 6 & 1 \end{bmatrix}.\] Then if $C = AB$, we have \[C = \begin{bmatrix} 3 \cdot 3 + 4 \cdot 1 + 6 \cdot 5 & 3 \cdot 8 + 4 \cdot 7 + 6 \cdot 6 & 3 \cdot 9 + 4 \cdot 6 + 6 \cdot 1 \\ 2 \cdot 3 + 8 \cdot 1 + 3 \cdot 5 & 2 \cdot 8 + 8 \cdot 7 + 3 \cdot 6 & 2 \cdot 9 + 8 \cdot 6 + 3 \cdot 1 \\ 7 \cdot 3 + 6 \cdot 1 + 5 \cdot 5 & 7 \cdot 8 + 6 \cdot 7 + 5 \cdot 6 & 7 \cdot 9 + 6 \cdot 6 + 5 \cdot 1 \\ \end{bmatrix} = \begin{bmatrix} 43 & 88 & 57 \\ 29 & 90 & 69 \\ 52 & 128 & 104 \end{bmatrix}.\]

A quick and dirty analysis nets us $O(n^3)$ operations: for each entry $c_{ij}$, we're performing a dot product of two length $n$ vectors, which we saw was $O(n)$ operations, and there are $n^2$ entries in the matrix. Here, we have a very similar situation to integer multiplication, where the question of whether the obvious matrix multiplication algorithm is really the best we can do. But in fact, it's maybe even more obvious that there should be clever ways to decompose matrix products, since that's something we do in linear algebra.

The most obvious way of doing this follows from integer multiplication, where we divide each of our $n \times n$ matrices into $\frac n 2 \times \frac n 2$ matrices. Assume we have $n = 2^k$ so we can split it evenly. Then $C = AB$ can be expressed as \[\begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} \] where $X_{ij}$ are $\frac n 2 \times \frac n 2$ matrices. This gives a natural deconposition where we compute \[C_{ij} = A_{i1} B_{1j} + A_{i2} B_{2j}.\] If you think that's either too good to be true or that it looks suspiciously like we haven't really done anything, your hunch is correct. We have divided our matrix into quadrants, where computing each quadrant is the sum of two matrix products of $\frac n 2 \times \frac n 2$ matrices. This means we have a total of eight different matrix multiplications, along with additions.

This gives us the recurrence \[T(n) = 8 T\left( \frac n 2 \right) + O(n^2),\] which if we apply the master theorem gives us $O(n^{\log_2 8}) = O(n^3)$. So again, the obvious decomposition is not quite clever enough to make an improvement.

And so like Karatsuba, we need something a bit cleverer. This trick is due to Strassen in 1968, who we've already encountered from Schönhage-Strassen. The idea is similar to Karatsuba's, and eliminates one set of matrix multiplications.

The matrix product of two $n \times n$ matrices can be computed in time $O(n^{2.81})$.

We assume that $n$ is a power of 2. Suppose $n$ is not a power of 2. Then we can pad matrices with 0 entries and proceed. Let \[\begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}\] and let

$P_1 = A_{11}(B_{12} - B_{22})$
$P_2 = (A_{11} + A_{12})B_{22}$
$P_3 = (A_{21} + A_{22})B_{11}$
$P_4 = A_{22}(B_{21} - B_{11})$
$P_5 = (A_{11} + A_{22})(B_{11} + B_{22})$
$P_6 = (A_{12} - A_{22})(B_{21} + B_{22})$
$P_7 = (A_{11} - A_{21})(B_{11} + B_{12})$

Then

$C_{11} = P_5 + P_4 - P_2 + P_6$
$C_{12} = P_1 + P_2$
$C_{21} = P_3 + P_4$
$C_{22} = P_1 + P_5 - P_3 + P_7$

Observe that this requires 7 matrix multiplications of matrices of size $\frac n 2 \times \frac n 2$ and a series of matrix additions, with total cost $O(n^2)$. This gives us the recurrence \[T(n) = 7T\left(\frac n 2\right) + O(n^2).\] By the Master theorem, we arrive at $T(n) = O(n^{\log_2 7}) = O(n^{2.81})$.

Since Strassen's name appears in both, you could imagine that the history of matrix multiplication algorithms tracks similarly to integer multiplication algorithms. Strassen's result kicked off a run of improvements to matrix multiplication, using far more complicated mathematics than splitting matrices up, which ended in 1989 with Coppersmith and Winograd getting matrix multiplication down to $O(n^{2.376})$.

This algorithm remains the best known algorithm for matrix multiplication, but recent work has further refined the method. In 2010, about twenty years later, Stothers gave an improvement to $O(n^{2.3737})$ in his PhD thesis and this was followed by Vassilevska Williams in 2011 with $O(n^{2.3729})$ and Le Gall in 2014 with $O(n^{2.37287})$. Most recently, just earlier this year, Alman and Vassilevska Williams improved this yet again to $O(n^{2.37286})$.

At this point, this seems like a purely academic exercise, but it's important to understand the context of where matrix multiplication sits in terms of fundamental problems in computer science. As mentioned earlier, matrix multiplication is a fundamental operation in applications that involve linear algebra, which includes areas like graphics, computer vision, machine learning, and quantum computation.

But matrix multiplication also serves as a stand-in for lower bounds for many problems that have been shown to be equivalent to it. My personal favourite comes from my area of research: formal languages, which sees applications in text processing and parsing. In particular, the problem of parsing for context-free languages is known to be reducible to matrix multiplication. In other words, there is a way to transform parsing into matrix multiplication and this means that there are no algorithms for parsing that do any better than $O(n^\omega)$, where $\omega$ is whatever the matrix multiplication exponent happens to be.

This particular result was shown by Leslie Valiant in 1975. However, the same goes the other way around: matrix multiplication can't do any better than whatever algorithm we muster up for parsing context-free grammars, because Lillian Lee showed in 2002 that you can solve matrix multiplication by turning it into context-free grammar parsing. This is another neat example of how solving a problem isn't just about solving the problem that's in front of you.

Unlike integer multiplication, no one has yet achieved a "nice" bound for matrix multiplication, nor are any algorithms after Strassen actually practical to implement. The question of whether $\Omega(n^2)$ is possible remains open.