MPCS 55001 — Lecture 8

Quicksort

Mergesort was the first algorithm that took us to $O(n \log n)$ sorting. However, there are a few questions that remain. The first is how do we know we can't do better than $O(n \log n)$? What if there's an $O(n)$ sorting algorithm out there? That's a question that can be answered, but we won't be doing that in this class.

The other issue is one that we'll deal with via a look at another sorting algorithm. One of the problems with Mergesort is that it involves the creation of a bunch of new arrays or lists in order to perform the recursive step. But this sort of thing is expensive if you're working with arrays. We would much rather have a way to sort an array of integers without messing around and creating a million new arrays.

So is there a way to sort an array simply by swapping elements around in an array? Quicksort is such an algorithm, invented by Tony Hoare in 1959, while he was working on machine translation with Kolmogorov (recall his earlier appearance).

The idea behind Quicksort is this: given an array $A$,

Choose an element $p$, which we call a pivot.
Go through the array and move all elements that are less than $p$ to the left of $p$ and all the elements that are greater than $p$ to its right.
Recursively sort the portion of the array to the left of $p$.
Recursively sort the portion of the array to the right of $p$.

Example 8.1. Consider the array $A = [2,7,3,6,1,5,9,0,8,4]$. If we choose $3$ as our pivot, one possible array we can obtain is $[2,1,0,3,7,6,5,9,8,4]$. We would then recursively sort $[2,1,0]$ and $[7,6,5,9,8,4]$.

For the purposes of analysis, we will be assuming that no two elements are the same. The correctness of this algorithm is not too difficult to prove in a similar way to Mergesort. The main difference is in showing the correctness of the step where we move all the elements around in step 2.

However, it's not entirely clear that this is actually efficient. There are two things we need to deal with here. First, let's deal with how to do step 2, the swapping. The question is whether we can do this efficiently. At first, it seems almost as bad as sorting, since we need to check, compare, and possibly swap every element in the array. However, there are some important differences that we can take advantage of. First, let's formally define the problem.

Problem 8.2. The partitioning problem is: Given an array $A$ of $n$ integers $a_1, \dots, a_n$ and a pivot $p = A[1]$, obtain an array $A'$ containing $a_1, \dots, a_n$ partitioned such that

$A[1,\dots,r-1]$ contains integers $a_i \lt p$,
$A[r]$ contains $p$,
$A[r+1,\dots,n]$ contains integers $a_i \gt p$.

There are several algorithms that solve the partitioning problem in $O(n)$ time.

Example 8.3. Consider the following algorithm on an array $A$:

Let $p$ be the pivot. Swap $p$ to be at the beginning of the array.
Initialize pointers/indices so that $lt$ is at the beginning of the array, $gt$ is at the end of the array, and $i = lt + 1$. Note that $A[lt] = p$.
As $i$ moves through the array:
1. If $A[i] \lt p$, swap $A[lt]$ and $A[i]$ and increment $i$ and $lt$.
2. If $A[i] \gt p$, swap $A[lt]$ and $A[i]$ and decrement $gt$.
We can stop once $i \gt gt$.

If $A = [8,7,6,2,3,1,4,9,0,5]$ and we choose $4$ to be our pivot, we obtain the following arrays: \begin{align*} [8,\underline 7,6,2,3,1,\fbox{4},9,0,\overline 5] \\ [\fbox{4},\underline 7,6,2,3,1,8,9,0,\overline 5] \\ [\fbox{4},\underline 5,6,2,3,1,8,9,\overline 0, 7] \\ [\fbox{4},\underline 0,6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 9,2,3,1,\overline 8, 6, 5, 7] \\ [0, \fbox{4},\underline 8,2,3,\overline 1, 9, 6, 5, 7] \\ [0, \fbox{4},\underline 1,2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, \fbox{4},\underline 2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, 2, \fbox{4},\underline{\overline 3}, 8, 9, 6, 5, 7] \\ [0, 1, 2, 3, \overline{\underline{\fbox{4}}}, 8, 9, 6, 5, 7] \\ \end{align*}

It's clear that this algorithm just moves through the array at most once, manipulating pointers/array indices and performing swaps.

Observe that in our algorithm, partitioning is where most of the work happens.

The remaining question is how we should pick our pivot $p$. Intuitively, if we pick $p$ poorly, say too small or large relative to the rest of the integers in the array, then we end up doing a lot more work in one of the "halves" (but they're not really halves because they're not even) of the array. But how bad can this get?

Proposition 8.4. Quicksort has running time $O(n^2)$ in the worst case.

Proof. Let $p$ be the pivot element. Let $T(n)$ be the number of comparisons on an input of size $n$. Then we have the recurrence \[T(n) = T(p-1) + T(n-p) + O(n).\] Here, we get $O(n)$ from the partition step, $T(p-1)$ for the lesser portion of the array, and $T(n-p)$ for the greater portion of the array. So we want to take $p$ such that this recurrence is maximized, \[T(n) = \max_{1 \leq p \leq n}\{T(p-1) + T(n-p) + O(n)\}.\] It's easy to see that this is maximized when $p = 1$ or $p - n$, giving us the recurrence \[T(n) = T(n-1) + O(n)\},\] which if we roll out, gives us $O(n^2)$. $$\tag*{$\Box$}$$

Basically, when the worst case happens, it's analogous to insertion or selection sort, depending on whether you chose the first or last element. It's pretty clear then that the optimal choice of $p$ is to choose the median element of the array, which would give you a recurrence resembling $T(n) = 2 T(n/2) + O(n)$. But this is sort of a chicken and egg problem: if we knew what the median element of the array was, we probably wouldn't need to sort it!

Randomization

The key here is to use randomness. Instead of trying to figure out how to find the best $p$, let's just randomly choose it! Using randomness is a very powerful technique that simplifies a lot of algorithms and bestows great efficiency. There is, of course, a philosophical question to be asked about where randomness comes from (especially for a deterministic machine like a computer), but for our purposes, we won't interrogate this too closely.

Our proposed algorithm is the following.

    \begin{algorithm}
    \caption{Randomized Quicksort}
    \begin{algorithmic}
    \PROCEDURE{quicksort}{$A,\ell,r$}
        \IF{$\ell \lt r$}
            \STATE $p = A[\mathtt{rand(\ell,r)}]$
            \STATE $(\ell', r') = \mathtt{partition}(A,p,\ell,r)$
            \STATE $\mathtt{quicksort}(A,\ell,\ell'-1)$
            \STATE $\mathtt{quicksort}(A,r'+1,r)$
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

Note that here, we specify indices for partitioning and Quicksort, because the idea is that we want to apply these functions to portions of $A$ instead of creating entirely new smaller arrays. In this case, the result of partition(A,p,l,r) is that the section of $A$ between $\ell$ and $r$, $A[\ell...r]$ will be partitioned with pivot $p$ and the result of quicksort(A,l,r) is that the portion of $A$ between $\ell$ and $r$, $A[\ell...r]$ will be sorted.

So what's the difference between trying to figure out which pivot is the best and just leaving it up to chance? It turns out even if we have a particularly bad split (say, $\frac 2 3$,$\frac 3 4$, or even $\frac 9{10}$), as long as we can fix that split and guarantee it gets no worse at every level of recursion, we can still get $O(n \log n)$ time.

Example 8.6. Suppose we can guarantee that any pivot that we choose doesn't do any worse than partitioning into partitions of size $\frac{1}{10}$ and $\frac{9}{10}$. This gives us the recurrence \[T(n) \leq T\left(\frac{n}{10}\right) + T\left(\frac{9n}{10}\right) + O(n).\] But if we consider the recursion tree, we see that we incur a total of at most $O(n)$ cost at each level and a total of $\log_{10/9} n$ levels. If we put this together, we still get a total of $O(n \log n)$ cost.

This seems to suggest that there is a lot of leeway in the choice of pivot. Intuitively, if there are a very small number of bad choices of pivots, then making a random choice should work out most of the time, since we'll have more okay or good pivots than bad on average. But how does one analyze this?

One approach that applies this notion of the guaranteed proportion is taken by Kleinberg and Tardos. If we fix some bound on the partitions of the array, say size between $\frac n 4$ and $\frac{3n} 4$, and reselect a pivot if the partition is poor, this gives something like a $\frac 1 2$ probability of choosing an acceptable pivot. Then we get a recurrence of \[T(n) \leq T\left(\frac{n}{4}\right) + T\left(\frac{3n}{4}\right) + O(n),\] giving us $O(n \log n)$ expected time. This is theoretically correct, but not entirely satisfying if you imagine your favourite programming language's sort function taking the time to partition something and then throwing it out to try again.

Luckily, we are able to prove that this runs in $O(n \log n)$ without having to resort to such chicanery, but we need to dig back into our memories of discrete probability theory from discrete math. This is the approach taken by CLRS, which considers the expected number of comparisons over the entire run of the algorithm. This perspective views Quicksort as an iterative algorithm rather than a recursive one, although a similar analysis can be made from that point of view too. But how does one count this?

To see this, we need to take a closer look at which comparisons can be made by our algorithm. For the purposes of analysis, we will rename our integers to $z_1, \dots, z_n$ so that $z_1 \lt \cdots \lt z_n$.

Definition 8.7. We define a random variable \[X_{ij} = \begin{cases} 1 & \text{if $z_i$ is compared to $z_j$}, \\ 0 & \text{otherwise.} \end{cases}\] And we consider the sum of all the comparisons, \[X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} X_{ij}.\] So, we want to compute $E(X)$, the expectation of $X$, which will give us our expected number of comparisons for a run of Quicksort. All that remains is to argue about the probability of each comparison.

Lemma 8.8. The probability that $z_i$ and $z_j$ are compared in a run of Quicksort is $\frac{2}{j-i+1}$.

Proof. Observe that for any element $x$ such that $z_i \lt x \lt z_j$ is chosen as a pivot, $z_i$ and $z_j$ do not get compared. Therefore, of the set $\{z_i, \dots, z_j\}$, either $z_i$ or $z_j$ must be chosen as a pivot first. Since there are $j-i+1$ such elements, the probability of this is $\frac{1}{j-i+1}$. Since either $z_i$ or $z_j$ can get chosen in order for the comparison to be made, we have a total probability of $\frac{2}{j-i+1}$. $$\tag*{$\Box$}$$

This probability is enough to let us compute the expected number of comparisons, $E(X)$.

Proposition 8.9. The expected number of comparisons over a single run of Quicksort is $O(n \log n)$.

Proof. We take the expectation of $X$, as defined above. Then we have \begin{align*} E(X) &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n E(X_{ij}) \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\ &= 2 \sum_{i=1}^{n-1} \sum_{j=2}^{n-i+1} \frac 1 j \\ &\leq 2n \sum_{j=1}^{n} \frac 1 j \\ &\leq 2n (1 + \log n) \\ &= O(n \log n) \end{align*} Therefore the expected number of comparisons is $O(n \log n)$. $$\tag*{$\Box$}$$

Recall that I asked about bad sorting algorithms in the first problem set. Some of you chose to give a randomized algorithm with worst-case running time of infinity. Here, we have something similar, where the worst-case running time is down to $O(n^2)$. However, as I mentioned in the solutions, for randomized algorithms, what we really care about is the expected running time, and this is an example of why.

As mentioned at the beginning of this section, Quicksort is commonly used by many programming languages. So what do real programming languages use to sort? Much like our discussion of integer multiplication algorithms, it turns out the answer is: whichever one is the right one for your use case. Many programming languages will use hybrid sorting algorithms, which will run one of several standard sorting algorithms under certain criteria:

Timsort, named for Tim Peters, who developed it, is the sorting algorithm used in Python and in Java 7 for arrays of objects), among others. It runs Mergesort until the input size is small enough, at which point it runs Insertion sort.
Introsort is a combination of Quicksort, Heapsort, and insertion sort, with size and recursion depth as the criteria for switching. This was developed by Musser in 1997 and is the standard sorting algorithm in C++.

Now, it seems like sorting is pretty well understood: we know that sorting requires $\Omega(n \log n)$ comparisons and we have hybrid algorithms that run the best algorithm depending on the circumstances. But in 2009, a crazy idea about Quicksort was revisted.

Back in 1975, Sedgewick exhaustively studied Quicksort in his PhD thesis and one of the ideas he had was to try adding more pivots: instead of partitioning your array into two, why not three? Well, it didn't work. And that makes some amount of sense. But what if his approach wasn't clever enough?

In 2009, Yaroslavskiy proposed a dual-pivot Quicksort algorithm for implementation in Java on the Java mailing lists. This algorithm has since been used in Java 7 for sorting arrays of primitive types. The reason for the improvement might be a bit surprising: it has to do with caching in modern hardware.