CMSC 27100 — Lecture 24

Today, we'll be continuing our discussion of expectation by looking at some applications to probabilistic analysis of algorithms.

First, recall that the typical paradigm for analyzing algorithms is worst-case analysis. That is, the complexity of an algorithm is determined by the worst-case input to that algorithm. For example, if we consider something like bubble sort or insertion sort, the worst-case input to the algorithm is an list that is in reverse-sorted order. Such an input forces our algorithm to carry out the most number of steps, which for an list of size $n$ is typically about $\binom n 2 = O(n^2)$ operations.

Where does the number $\binom n 2$ come from? If we think about those simple sorting algorithms, we notice that we're always considering pairs of elements to put into order. So the maximum number of comparisons should be comparing all possible pairs: $\binom n 2$. (Obviously, there are worse, less-clever algorithms that will surpass this)

A common objection to worst-case analysis is that it may be overly pessimistic. In practice, we like to believe that the typical or average input may be okay and that the worst case inputs are weird artificial edge cases. But how would we prove something like this? What does it mean for an input to be the "average" input?

To answer those questions, we need to consider the distribution of the inputs we're working with. This means we need to know something about the structure of our inputs and perform some probabilistic analysis.

Recall that the solution space for sorting is all permutations of a list. The solution that we want is the permutation that has the list in sorted order. But generally speaking, the input list can be any one of those permutations. So we need to say something about the structure of these permutations—just how "unsorted" is the average permutation?

Consider a random permutation $\pi$ of $\{1, 2, \dots, n\}$. A pair $(i,j)$ is an inversion in $\pi$ if $i \lt j$ and $j$ precedes, or appears before, $i$ in $\pi$. What is the expected number of inversions in $\pi$?

First, we should note that the occurrence of inversions is not independent: if we reorder one pair of numbers, it is possible that we can create other inversions.

Let $I_{i,j}$ be the random variable with $I_{i,j} = 1$ if $i \lt j$ and $(i,j)$ is an inversion and $I_{i,j} = 0$ otherwise. Let $X$ be the random variable for the number of inversions in a permutation. Then \[X = \sum_{1 \leq i \lt j \leq n} I_{i,j}.\] Then we have \[E(X) = \sum_{1 \leq i \lt j \leq n} E(I_{i,j}).\] So, what is $E(I_{i,j})$? If we consider all permutations, then half of them will have $i \lt j$ and the other half will have $j \lt i$. This gives us $E(I_{i,j}) = \frac 1 2$. Then, we count the number of pairs $(i,j)$ with $i \lt j$ to get the number of terms in our summation. There are $\binom n 2$ such pairs, since we pick any pair of numbers and there's only one possible order we care about. This gives us \[E(X) = \sum_{1 \leq i \lt j \leq n} E(I_{i,j}) = \binom n 2 \frac 1 2 = \frac{n(n-1)}2 \frac 1 2 = \frac{n(n-1)}4.\]

This leads us to a nice average-case analysis of bubble sort. As you may recall, bubble sort involves going through an array and checking if each pair of adjacent elements is in the right order and swaps them if they're not. The worst-case analysis comes when we have an array that's totally in reverse, so all $\binom n 2$ pairs of elements are inverted and we're forced to swap all of them. Our analysis then says that if we have an average or random list, we should expect that there are $\frac{n(n-1)}4$ inverted pairs. But this is still $O(n^2)$.

Typically, average-case analysis is difficult because we don't really know that much about the distribution of the input—it's not always easy to come up with a way to measure this. The other use of probability theory in algorithms analysis is to analyze the behaviour of randomized algorithms.

With randomized algorithms, it doesn't make sense to talk about worst-case behaviour because the behaviour of the algorithm changes on the same input. Instead, we need to analyze the probabilistic behaviour of the algorithm. Since we use randomness in our algorithms in a controlled way, we should be able to analyze it using tools from probability theory. One simple way example of this is making some random choice until we get a correct answer.

Up until now, we've been considering a sequence of trials of some fixed length. But sometimes, we may want to perform a sequence of trials until we get a success. This means that we are working with an infinite sample space and we can be potentially waiting for a long time. However, we can show that we can put an estimate on how long we should expect to wait.

Let $X$ be a Bernoulli trial with probability of success $p$. The expected number of trials before the first success is $\frac 1 p$.

First, we note that we are interested in events where there is exactly one success, which is the final trial. After we see a success, we can stop performing trials. So suppose we perform $k$ trials and the last one is the success. Let $Y$ be the random variable equal to the number of trials of $X$ that are performed, with a final success. Then we have \[\Pr(Y = j) = (1-p)^{j-1} p\] since we needed to perform $j-1$ failures before getting a success. Then the expected number of trials is \[E(Y) = \sum_{j=0}^\infty j \cdot \Pr(Y=j) = \sum_{j=1}^\infty j(1-p)^{j-1}p = p \sum_{j=1}^\infty j (1-p)^{j-1}.\] Note here that we take out the term for $Y=0$, since that term is 0 and doesn't contribute to the sum. Now, we apply the following very useful identity: \[\sum_{k=1}^\infty k x^{k-1} = \frac 1 {(1-x)^2}.\] Setting $k = j$ and $x = 1-p$ gives us \[\sum_{j=1}^\infty j (1-p)^{j-1} = \frac 1 {(1-(1-p))^2} = \frac 1{p^2}.\] Then we have \[p \sum_{j=1}^\infty j (1-p)^{j-1} = p \cdot \frac 1 {p^2} = \frac 1 p.\]

Consider the majority problem. Suppose you have a collection of $n$ items, one of which occurs more than $\frac n 2$ times. Suppose we want to find this majority element quickly and using no extra space—perhaps we have an extraordinarily large number of items, so we can't keep track of their counts because it would use too much space.

Here is an algorithm that does this: choose an item uniformly at random. Then go through every element and check, keeping count of your chosen item only. If it's more than half the items, you know you have the majority. If you don't, try again.

How long does this take? One round of this guessing and checking takes $O(n)$ time: we go through all the items and compare them once. So if we're lucky, we only do this a few times. But if we're not lucky, this could go on forever.

How many rounds of guessing do we expect we need? This is exactly the question of how many trials we expect to perform until we succeed once. So we apply the geometric distribution. Because we assume one of the elements is the majority, this means that this element occurs with probability at least $\frac 1 2$. Therefore, the expected number of attempts we make is about $\frac 1 p = 2$.

This means that the expected running time of the algorithm is $O(n)$: each guess and check round takes $O(n)$ time and the expected number of rounds is 2, which is a constant. For randomized algorithms, we use expected running time rather than worst-case running time.

CMSC 27100 — Lecture 24

Probabilistic analysis of algorithms