MPCS 55001 — Lecture 4

To review, solving a problem using a divide and conquer algorithm follows a template:

Proving correctness and efficiency then involves proving the efficiency and correctness of these steps. Proving the efficiency of a divide and conquer algorithm involves solving the recurrence based on the recursive steps and the combination step. Proving correctness involves proving the correctness of the combination step and inductively proving the correctness of the recursive steps.

Let's look at a neat problem that appears harder than it looks at first (well, for computers, it still might take a bit of work for us). First, an important definition.

The Euclidean distance between two points $p_1 = (x_1,y_1)$ and $p_2 = (x_2,y_2)$ is defined by \[d(p_1,p_2) = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}.\]

The closest points on a plane problem is: Given a list of $n$ points on a plane, find the closest pair of points.

This is a fairly simple problem that is another one of those things that people probably really want to know about and efficiently, since it's a very basic computational geometry problem and has obvious applications in all sorts of graphics and computer vision problems. It was first studied by Shamos and Hoey in 1975, where they showed the algorithm we're about to investigate.

We will assume that squaring and square roots are basic operations (although we don't really need to take square roots). There is a pretty obvious brute force algorithm: just compute and check the distance of every pair of points. Of course, if you have $n$ points, that's $O(n^2)$ things you have to check.

Designing the algorithm

So we want to try to use divide and conquer on this set, ideally splitting our problem up into subproblems of roughly equal size. It's not exactly clear how to do this, which can lead to something complicated like splitting our space into quadrants or something.

Luckily, we do not need to go that far. Something simpler we can do is to just split our problem into two. First, we sort our points by their $x$-coordinate and then divide our points accordingly. Picture a vertical line $L$ going down the middle of the space. Then we recursively find the closest pair in each half and choose the closer of the two. Once we get down to three points in our space, we can just compute which pair is closest. Easy! But there is a big complication with this strategy:

What happens if the closest pair is actually right in the middle, with one point to the left and one point to the right?

Surprisingly, all is not lost. Let $\delta$ be the smaller of the distances computed for the left closest pair and the right closest pair. If we run into the edge case that we described, we know that this pair can't be more than $\delta$ away from each other!

What we can do is consider only those points that are at most $\delta$ away from the line $L$. Sort these by their $y$-coordinates. The next question is: how many comparisons do we have to make when considering these points? It seems like we could be looking at arbitrarily many such comparisons, but we can do a bit more narrowing down.

Let $S = \{p_1, \dots, p_k\}$ be the set of points within distance $\delta$ of line $L$, sorted in order of $y$-coordinate. Then, for each point $p_i$, if $p_j$ with $|i - j| \gt 7$, then $d(p_i,p_j) \gt \delta$.

Consider a rectangle $R$ of width $2\delta$ and height $\delta$ centered on $L$. We claim that this rectangle can contain at most 8 points. If it contains more, then there are enough points on one side of $L$ so that there is at least one pair of points with distance closer than $\delta$, which would contradict our assumption that $\delta$ was the distance of the closest pair of points on one side.

To see this, we divide $R$ into eight squares of length and width $\frac \delta 2$. We claim that each square can contain at most one point. Observe that the furthest apart any two points can be in such a square is $\frac{\delta}{\sqrt 2} \lt \delta$.

Therefore, for any point within the $\delta$-region of $L$, it suffices to compare it with at most seven other points, since all other points are guaranteed to have distance greater than $\delta$. The important thing about this is we only need to compare it with a constant number of points (it is possible to improve the constant from seven by employing more complicated geometric arguments). So even if we end up with an unusually large number of points along this strip, we know that at worst, we need something on the order of $O(n)$ comparisons, which crucially doesn't take us out of the realm of $O(n \log n)$.

So a very general description of our algorithm, which we'll refer to as Closest-Pair, is as follows:

This is enough to prove that our algorithm is correct. Just like with Mergesort, this involves first proving the combining step is true (which we just did above) and then proving that the algorithm itself is correct via an inductive argument. In this case, the base case would be when we need to determine the closest pair of 3 points.

We will prove this by induction on the number of points, $n$.

Our base case is when $n \leq 3$. In this case, finding the closest pair of points can be done in constant time by simple computation.

For our inductive case let $n \gt 3$ and assume that Closest-Pair is correct for fewer than $n$ points. We divide our set of points into two, with the $\frac n 2$ points on the left and $\frac n 2$ points on the right. We recursively find the closest pair on each side, which we may take to be correct by our inductive hypothesis.

This gives us three scenarios. The first is that the closest pair is in the left half of the set. In this case, we have found our pair recursively. Similarly, the second case is that the closest pair is on the right half. The last case is that the closest pair is actually crossing the middle, in which case, our recursive calls will not have considered such pairs.

To consider pairs across the middle, take the closer of the two pairs we acquired from our recursive calls and let its distance be $\delta$. We only need to check pairs of points within distance $\delta$ of the middle line $L$, since all other pairs of points are further away than our witness from each half. If we find a pair that is closer, we take that to be our pair.

Thus, our algorithm will find the closest pair of points.

Efficiency and Implementation

An efficient implementation almost follows from what we've just described. The only complication occurs in the combining phase, when we try to examine points along the middle strip. As we've described the algorithm, we would require sorting points by their $y$-coordinate each time we want to combine solutions. This leads to an $O(n \log n)$ step at each recursive call, so we need to avoid this.

It turns out the trick is in preprocessing and keeping two copies of the points: one sorted by $x$-coordinate and one sorted by $y$-coordinate. Once we do this, we can just extract the points we need in order by referring to these two lists. This then gets us the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{closest-pair}{$S = \{p_1 = (x_1, y_1), \dots, p_n = (x_n,y_n)\}$}
        \STATE $S_x = $ $S$ sorted by $x$-coordinate
        \STATE $S_y = $ $S$ sorted by $y$-coordinate
        \RETURN{\CALL{closest-pair-rec}{$S_x,S_y$}}
    \ENDPROCEDURE

    \PROCEDURE{closest-pair-rec}{$S_x, S_y$}
        \IF{$n \leq 3$}
            \RETURN{closest pair of the three points}

        \ELSE
            \STATE Construct $L_x,L_y,R_x,R_y$
            \STATE $(\ell_1,\ell_2) = $ \CALL{closest-pair-rec}{$L_x,L_y$}
            \STATE $(r_1,r_2) = $ \CALL{closest-pair-rec}{$R_x,R_y$}

            \STATE $\delta = \min(d(\ell_1,\ell_2),d(r_1,r_2))$

            \STATE $L = \max\limits_{(x,y) \in L_x} x$
            \STATE $M = \{(x,y) \in S_y \mid |x - L| \leq \delta\}$, sorted by $y$-coordinate

            \FORALL{$p \in M$}
                \STATE Compute $d(s,s')$ for next 7 points $s' \in M$
            \ENDFOR
            \STATE Let $(s,s')$ be the pair of minimum distance computed among points of $M$
            \RETURN{smallest of $(s,s'), (\ell_1,\ell_2), (r_1,r_2)$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm is a bit more complicated than ones we've seen before, so this proof will be a bit more exhaustive.

First, the sorting steps of lines 2 and 3 can be done in $O(n \log n)$ time. We then need to analyze the running time for Closest-Pair-Rec. Let $T(n)$ be the number of operations performed by Closest-Pair-Rec. When $n \leq 3$, we have $T(n) = O(1)$.

Now, consider when $n \gt 3$.

In line 9, we construct lists of points for the left and right halves. This can be done by scanning the lists $S_x$ and $S_y$, which has time $O(n)$.
Lines 10 and 11 are recursive calls to Closest-Pair-Rec on inputs of size $\lfloor n/2 \rfloor$ and $\lceil n/2 \rceil$. This gives $T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil)$ operations.
Lines 12 and 13 are minimum and maximum finding computations, which require $O(1)$ time.
Line 14 is the construction of the list of points close to the middle, in order of $y$-coordinate. This is constructed by scanning the list $S_y$ and computing the distance of each point from $L$, which can be done in $O(n)$ time.
Lines 15 and 16 loop through $M$ and for each point, performs a constant number of conmputations, for a total of $O(n)$ time.
The rest of the algorithm are minimum-finding operations, which are constant time operations.

Therefore, we have the recurrence \[T(n) = \begin{cases} O(1) & \text{if $n \leq 3$}, \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + O(n) & \text{if $n \gt 3$.} \end{cases}\] and we have that $T(n) = O(n \log n)$, by the Master Theorem, or by observing that this is the same recurrence that we got from Mergesort.

Then the running time of Closest-Pair is $O(n \log n) + T(n) = O(n \log n)$.

This is not directly related to divide and conquer algorithms, but it turns out that sorting a bunch of points is a solid first step for many computational geometry algorithms. For instance, the convex hull problem asks for a minimal convex polygon that encloses a set of points. This sounds complicated, but it's no more difficult than sorting! Once you have the convex hull of a set of points, you get the farthest pair of points very easily: just check pairs of points on the convex hull!

Multiplication

Integer multiplication is another one of the classical results that uses this technique of division into subproblems and it's one of the cases where our usual assumptions about unit cost operations won't fly. First, let's consider how we'd add two $n$-bit integers.

Obviously, in order for our analysis to be of any use, we can't use integer addition as a unit cost operation. We will instead consider operations at the bit level to be unit cost operations. So bit addition is a basic operation and adding two $n$-bit numbers requires $n$ bit operations. We can conclude that we need $\Theta(n)$ bit operations to add $n$-bit integers, since the worst case is having to perform $n$ bit operations and in even the best case, we have to use $n$ bit operations.

Then how to do we apply this to multiplication? Again, we are measuring in terms of bit operations, which include bit addition and bit shifts (multiplication by a power of 2).

The obvious approach is to use our hand pen and paper algorithm that we learned in school.

A quick look at this gives us a $O(n^2)$ algorithm: we perform $n$ bit operations for each digit of the bottom number, which has $n$ digits itself. This was conjectured to be optimal by Kolmogorov in 1956 and it's definitely hard to see how we could possibly improve this. After all, it seems like we would need to look through every digit to perform the multiplications.

Fast integer multiplication

However, in 1960, Karatsuba was able to show that we can do better than $O(n^2)$. As he recounts, he encountered the $n^2$ conjecture in Kolmogorov's seminar on mathematical problems in computer science and came up with a better algorithm in about a week.

Consider two $n$-bit numbers $M$ and $N$. We write $M = ab$ and $N = cd$, where $a,b,c,d$ are $\frac n 2$-bit strings. In other words, $a$ is the left half of $M$ and $b$ the second half, and we do the same for $c$ and $d$ with $N$. Then we observe the following. \begin{align*} MN &= (a \cdot 2^{\frac n 2} + b)(c \cdot 2^{\frac n 2} + d) \\ &= ac2^n + (ad+bc) 2^{\frac n 2} +bd \end{align*}

Notice that now, we are performing four multiplications of $\frac n 2$-bit numbers. Let's figure out if this is actually better or not. This gives us the recurrence \[T(n) = 4 T\left(\frac n 2 \right) + O(n),\] since we split our problem into four subproblems of size $\frac n 2$, followed by $O(n)$ bit additions. If we use the master method, we get that $T(n) = O(n^{\log_2 4}) = O(n^2)$, so this isn't actually any better than what we started out with. But this wasn't Karatsuba's result.

The one weird trick ("Kolmogorov was very agitated") that Karatsuba employed was observing that \[ad+bc = ac+bd - (a-b)(c-d).\] Now, this looks more complicated, but the crucial observation is that we've already computed $ac$ and $bd$! This means that the only new multiplication we need to perform is $(a-c)(b-d)$. We then get the recurrence \[T(n) = 3 T\left(\frac n 2 \right) + O(n),\] which gets us $O(n^{\log_2 3}) = O(n^{1.585})$, a significant improvement.

    \begin{algorithmic}
    \PROCEDURE{multiply}{$M,N,n$}
        \IF{$n = 1$}
            \RETURN{$M \times N$}
        \ELSE
            \STATE $m = \left\lceil \frac n 2 \right\rfloor$
            \STATE $a = \left\lfloor \frac M {2^m} \right\rfloor$
            \STATE $b = M \bmod 2^m$
            \STATE $c = \left\lfloor \frac N {2^m} \right\rfloor$
            \STATE $d = N \bmod 2^m$
            \STATE $e = $ \CALL{multiply}{$a,c,m$}
            \STATE $f = $ \CALL{multiply}{ $a,c,m$ }
            \STATE $g = $ \CALL{multiply}{$|a-b|,|c-d|,m$}
        \RETURN{$2^{2m}e + 2^m (e+f-g) + f$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

Let $M,N$ be two $n$-bit integers. Then $M \times N$ can be computed using $O(n^{\log_2 3}) = O(n^{1.585})$ bit operations.

Let $T(n)$ be the number of bit operations that are performed by Algorithm 7.?. We have $T(1) = 1$. Then for $n \gt 1$, we observe that lines 5 through 9 can each be computed using $m$ bit shifts, which amounts to $O(n)$ operations in total. Each call of multiply in lines 10-12 cost $T\left(\frac n 2 \right)$ bit operations, while computing $|a-b|$ and $|c-d|$ in line 12 require $n$ bit operations. In total, we arrive at $T(n) = 3T\left(\frac n 2 \right) + O(n)$, which we have previously shown is $O(n^{\log_2 3}) = O(n^{1.58})$ by the Master theorem.

Karatsuba's algorithm is the first of a number of fast integer multiplication algorithms that have since been developed and is a great example of how . Now, granted, people weren't exactly clamouring for a fast multiplication algorithm before computers were invented, so there was no real utility to this kind of thing.

However, now that it was clear that $n^2$ isn't the best we can do, the question of just how low we can go was now wide open. This kind of question leads to two main types of approaches:

In his 1966 PhD Thesis, Cook developed an algorithm, based on an earlier 1963 algorithm of Toom, that is essentially a generalization of Karatsuba's. Very roughly speaking, it takes the idea of splitting inputs into a bunch of subproblems and combines them even further than Karatsuba does and is parameterized by the number of splits. For the next level up from Karatsuba's algorithm, Toom-Cook splits integers into three instead of two. Ordinarily, this leads to nine multiplications, but their algorithm reduces it to five, giving a running time of $O(n^{\log_3 5}) = O(n^{1.45})$. This gets better asymptotically as you split your integers into more pieces but the cost of splitting and combining becomes so large that it becomes impractical for small (most) values of $n$.

In 1971, Schönhage and Strassen gave an algorithm with running time $O(n \log n \cdot \log \log n)$. It's hard to see once you start dealing with these kinds of functions, but Schönhage-Strassen is asympotically better than Toom-Cook once you get to very large $n$ and this remained the best for over thirty years. Schönhage and Strassen also conjectured a lower bound for integer multiplication: $\Omega(n \log n)$.

The next development came in 2007, when Fürer gave an algorithm with $O(n \log n \cdot 2^{O(\log^* n)})$ time. At first, this looks bad, but $\log^* n$ is the iterated logarithm and is another one of those very slowly growing functions like $\alpha(n)$. Here, we need something like $n \gt 2^{2^{16}}$ for $\log^* n \gt 5$, which is to say that this algorithm is an improvement.

This kicked off another series of developments which culminated in an algorithm published last year, in March 2019, by Harvey and van der Hoeven, which solves integer multiplication in $O(n \log n)$ time. There is still the question of whether this is really the lower bound, but it's a significant achievement, and is a long way from the original conjecture of $\Theta(n^2)$.

For more grounded individuals, there is also the question of practicality. None of the algorithms after Schönhage-Strassen are acutally practical, in that they really only perform asymptotically better on large enough inputs, and those sizes are impractically large. This is a common outcome in algorithms research: someone proves a theoretical bound, which isn't useful in practice.

That doesn't mean that these results aren't important, even practically speaking. If we look at this trail of algorithms that we have and look at how many of the algorithms are actually in use, we get a slightly surprising answer: a lot of them.

For this particular problem, the quirk that a lot of these algorithms do better for sufficiently large $n$ means that often the fastest way to multiply two numbers involves choosing the right algorithm based on the size of the numbers. So modern mathematical libraries will implement everything from grade school multiplication up to Schönhage-Strassen and run the right algorithm for the right input.

Quicksort

Between Mergesort and Heapsort, we know that sorting can be done in $O(n \log n)$ time. However, there are a few questions that remain. The first is how do we know we can't do better than $O(n \log n)$? What if there's an $O(n)$ sorting algorithm out there? That's a question that can be answered, but we won't be doing that in this class.

The other issue is one that we'll deal with via a look at another sorting algorithm. One of the problems with Mergesort and Heapsort are that they involve the creation of a bunch of new arrays or lists in order to perform the recursive step. But this sort of thing seems expensive if you're working with arrays. We might rather have a way to sort an array of integers without messing around and creating a million new arrays.

So is there a way to sort an array in $O(n \log n)$ time simply by swapping elements around in an array? Quicksort is such an algorithm, invented by Tony Hoare in 1959, while he was working on machine translation with Kolmogorov (recall his earlier appearance).

Consider the array $A = [2,7,3,6,1,5,9,0,8,4]$. If we choose $3$ as our pivot, one possible array we can obtain is $[2,1,0,3,7,6,5,9,8,4]$. We would then recursively sort $[2,1,0]$ and $[7,6,5,9,8,4]$.

For the purposes of analysis, we will be assuming that no two elements are the same. The correctness of this algorithm is not too difficult to prove in a similar way to Mergesort. The main difference is in showing the correctness of the step where we move all the elements around in step 2.

However, it's not entirely clear that this is actually efficient. There are two things we need to deal with here. First, let's deal with how to do step 2, the swapping. The question is whether we can do this efficiently. At first, it seems almost as bad as sorting, since we need to check, compare, and possibly swap every element in the array. However, there are some important differences that we can take advantage of. First, let's formally define the problem.

The partitioning problem is: Given an array $A$ of $n$ integers $a_1, \dots, a_n$ and a pivot $p = A[1]$, obtain an array $A'$ containing $a_1, \dots, a_n$ partitioned such that

There are several algorithms that solve the partitioning problem in $O(n)$ time.

If $A = [8,7,6,2,3,1,4,9,0,5]$ and we choose $4$ to be our pivot, we obtain the following arrays: \begin{align*} [8,\underline 7,6,2,3,1,\fbox{4},9,0,\overline 5] \\ [\fbox{4},\underline 7,6,2,3,1,8,9,0,\overline 5] \\ [\fbox{4},\underline 5,6,2,3,1,8,9,\overline 0, 7] \\ [\fbox{4},\underline 0,6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 9,2,3,1,\overline 8, 6, 5, 7] \\ [0, \fbox{4},\underline 8,2,3,\overline 1, 9, 6, 5, 7] \\ [0, \fbox{4},\underline 1,2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, \fbox{4},\underline 2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, 2, \fbox{4},\underline{\overline 3}, 8, 9, 6, 5, 7] \\ [0, 1, 2, 3, \overline{\underline{\fbox{4}}}, 8, 9, 6, 5, 7] \\ \end{align*}

It's clear that this algorithm just moves through the array at most once, manipulating pointers/array indices and performing swaps.

The remaining question is how we should pick our pivot $p$. Intuitively, if we pick $p$ poorly, say too small or large relative to the rest of the integers in the array, then we end up doing a lot more work in one of the "halves" (but they're not really halves because they're not even) of the array. But how bad can this get?

Let $p$ be the pivot element. Let $T(n)$ be the number of comparisons on an input of size $n$. Then we have the recurrence \[T(n) = T(p-1) + T(n-p) + O(n).\] Here, we get $O(n)$ from the partition step, $T(p-1)$ for the lesser portion of the array, and $T(n-p)$ for the greater portion of the array. So we want to take $p$ such that this recurrence is maximized, \[T(n) = \max_{1 \leq p \leq n}\{T(p-1) + T(n-p) + O(n)\}.\] It's easy to see that this is maximized when $p = 1$ or $p - n$, giving us the recurrence \[T(n) = T(n-1) + O(n),\] which if we roll out, gives us $O(n^2)$.

Basically, when the worst case happens, it's analogous to insertion or selection sort, depending on whether you chose the first or last element. It's pretty clear then that the optimal choice of $p$ is to choose the median element of the array, which would give you a recurrence resembling $T(n) = 2 T(n/2) + O(n)$. But this is sort of a chicken and egg problem: if we knew what the median element of the array was, we probably wouldn't need to sort it!

Randomized Quicksort

The key here is to use randomness. Instead of trying to figure out how to find the best $p$, let's just randomly choose it! Using randomness is a very powerful technique that simplifies a lot of algorithms and bestows great efficiency. There is, of course, a philosophical question to be asked about where randomness comes from (especially for a deterministic machine like a computer), but for our purposes, we won't interrogate this too closely.

    \begin{algorithmic}
    \PROCEDURE{quicksort}{$A,\ell,r$}
        \IF{$\ell \lt r$}
            \STATE $p = A[\mathtt{rand}(\ell,r)]$
            \STATE $(\ell', r') = \mathtt{partition}(A,p,\ell,r)$
            \STATE $\mathtt{quicksort}(A,\ell,\ell'-1)$
            \STATE $\mathtt{quicksort}(A,r'+1,r)$
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

Note that here, we specify indices for partitioning and Quicksort, because the idea is that we want to apply these functions to portions of $A$ instead of creating entirely new smaller arrays. In this case, the result of partition(A,p,l,r) is that the section of $A$ between $\ell$ and $r$, $A[\ell...r]$ will be partitioned with pivot $p$ and the result of quicksort(A,l,r) is that the portion of $A$ between $\ell$ and $r$, $A[\ell...r]$ will be sorted.

So what's the difference between trying to figure out which pivot is the best and just leaving it up to chance? It turns out even if we have a particularly bad split (say, $\frac 2 3$,$\frac 3 4$, or even $\frac 9{10}$), as long as we can fix that split and guarantee it gets no worse at every level of recursion, we can still get $O(n \log n)$ time.

Suppose we can guarantee that any pivot that we choose doesn't do any worse than partitioning into partitions of size $\frac{1}{10}$ and $\frac{9}{10}$. This gives us the recurrence \[T(n) \leq T\left(\frac{n}{10}\right) + T\left(\frac{9n}{10}\right) + O(n).\] But if we consider the recursion tree, we see that we incur a total of at most $O(n)$ cost at each level and a total of $\log_{10/9} n$ levels. If we put this together, we still get a total of $O(n \log n)$ cost.

This seems to suggest that there is a lot of leeway in the choice of pivot. Intuitively, if there are a very small number of bad choices of pivots, then making a random choice should work out most of the time, since we'll have more okay or good pivots than bad on average. But how does one analyze this?

One approach that applies this notion of the guaranteed proportion is taken by Kleinberg and Tardos. If we fix some bound on the partitions of the array, say size between $\frac n 4$ and $\frac{3n} 4$, and reselect a pivot if the partition is poor, this gives something like a $\frac 1 2$ probability of choosing an acceptable pivot. The obvious question in this scenario is how many times we expect to repeat this process until we get an acceptable pivot.

Luckily, the geometric distribution tells us that for a trial with probability of success $p$, we can expect to perform $\frac 1 p$ trials before we succeed. Taking $p = \frac 1 2$, this means that we expect to do this twice each time. Since partitioning is $O(n)$, this means we still end up with $O(n)$ expected time to partition, even with taking failures into account.

Then we get a recurrence of \[T(n) \leq T\left(\frac{n}{4}\right) + T\left(\frac{3n}{4}\right) + O(n),\] giving us $O(n \log n)$ expected time based on the above discussion. This is theoretically correct, but not entirely satisfying if you imagine your favourite programming language's sort function taking the time to partition something and then throwing it out to try again.

Luckily, we are able to prove that this runs in $O(n \log n)$ without having to resort to such chicanery. Wel'll consider the analysis taken by CLRS, which considers the expected number of comparisons over the entire run of the algorithm. This perspective views Quicksort as an iterative algorithm rather than a recursive one, although a similar analysis can be made from that point of view too. But how does one count this?

To see this, we need to take a closer look at which comparisons can be made by our algorithm. For the purposes of analysis, we will rename our integers to $z_1, \dots, z_n$ so that $z_1 \lt \cdots \lt z_n$.

We define a random variable \[X_{ij} = \begin{cases} 1 & \text{if $z_i$ is compared to $z_j$}, \\ 0 & \text{otherwise.} \end{cases}\] And we define a random variable $X$ for the total number of comparisons, \[X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} X_{ij}.\] So, we want to compute $E(X)$, the expectation of $X$, which will give us our expected number of comparisons for a run of Quicksort. All that remains is to argue about the probability of each comparison.

The probability that $z_i$ and $z_j$ are compared in a run of Quicksort is $\frac{2}{j-i+1}$.

Observe that for any element $x$ such that $z_i \lt x \lt z_j$ is chosen as a pivot, $z_i$ and $z_j$ do not get compared. Therefore, of the set $\{z_i, \dots, z_j\}$, either $z_i$ or $z_j$ must be chosen as a pivot first. Since there are $j-i+1$ such elements, the probability of this is $\frac{1}{j-i+1}$. Since either $z_i$ or $z_j$ can get chosen in order for the comparison to be made, we have a total probability of $\frac{2}{j-i+1}$.

This probability is enough to let us compute the expected number of comparisons, $E(X)$.

The expected number of comparisons over a single run of Quicksort is $O(n \log n)$.

We take the expectation of $X$, as defined above. Then we have \begin{align*} E(X) &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n E(X_{ij}) \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\ &= 2 \sum_{i=1}^{n-1} \sum_{j=2}^{n-i+1} \frac 1 j \\ &\leq 2n \sum_{j=1}^{n} \frac 1 j \\ &\leq 2n (1 + \log n) \\ &= O(n \log n) \end{align*} Therefore the expected number of comparisons is $O(n \log n)$.

Recall that the worst-case running time for the deterministic version of this algorithm is $O(n^2)$. This algorithm has the same worst-case running time, if Quicksort always happens to choose the worst pivot. However, for randomized algorithms, what we really care about is the expected running time. Since we use a random process to choose the pivot, we are no longer bound to the worst-case, since the algorithm is no longer deterministic. This is a nice example of how introducing a bit of randomness can help create efficient algorithms.

As mentioned at the beginning of this section, Quicksort is commonly used by many programming languages. So what do real programming languages use to sort? Much like our discussion of integer multiplication algorithms, it turns out the answer is: whichever one is the right one for your use case. Many programming languages will use hybrid sorting algorithms, which will run one of several standard sorting algorithms under certain criteria:

Now, it seems like sorting is pretty well understood: we know that sorting requires $\Omega(n \log n)$ comparisons and we have hybrid algorithms that run the best algorithm depending on the circumstances. But in 2009, a crazy idea about Quicksort was revisted.

Back in 1975, Robert Sedgewick (who has written several books on algorithms that you may have encountered) exhaustively studied Quicksort in his PhD thesis and one of the ideas he had was to try adding more pivots: instead of partitioning your array into two, why not three? Well, it didn't really make much of a difference asymptotically, and that makes some amount of sense.

However, in 2009, Yaroslavskiy proposed a dual-pivot Quicksort algorithm for implementation in Java on the Java mailing lists, which seemed to be faster than classic Quicksort in practice. The results were so convincing that this algorithm has since been used in Java 7 for sorting arrays of primitive types.

The first theoretical analysis of this algorithm was done by Wild and Nebel in 2012, and in 2014, it was shown by Kushagra, López-Ortiz, Qiao, and Munro that the reason for the improvement might be a bit surprising: it has to do with caching in modern hardware.