CMSC 27100 — Lecture 19

Expectation

We have moved from talking about the probability of events to thinking about the probability of a property of events, in the form of random variables. Now, if we know these probabilities, we can ask the question of what kinds of values we might expect more often.

Definition 19.1. Let $(\Omega,\Pr)$ be a probability space and let $X$ be a random variable. The expected value of $X$ is $$E(X) = \sum_{\omega \in \Omega} X(\omega) \Pr(\omega).$$

Roughly speaking, the expectation of a random variable is the (weighted) average value for the random variable.

Example 19.2. Consider a fair 6-sided die and let $X$ be the random variable for the number that was rolled. We have $$E(X) = 1 \cdot \frac 1 6 + 2 \cdot \frac 1 6 + 3 \cdot \frac 1 6 + 4 \cdot \frac 1 6 + 5 \cdot \frac 1 6 + 6 \cdot \frac 1 6 = \frac{21}{6} = 3.5.$$ Now, consider a biased 6-sided die with $$\Pr(\omega) = \begin{cases} \frac 3 4 & \text{if $\omega = 1$,} \\ \frac{1}{20} & \text{otherwise}. \end{cases}$$ Let $Y$ be the random variable for the number that was rolled with our biased die. Then we get $$E(Y) = 1 \cdot \frac 3 4 + 2 \cdot \frac{1}{20} + 3 \cdot \frac{1}{20} + 4 \cdot \frac{1}{20} + 5 \cdot \frac{1}{20} + 6 \cdot \frac{1}{20} = \frac 3 4 + \frac{20}{20} = 1.75.$$ This indicates to us that the biased die will give us a 1 more often and therefore the average value that we should expect from a roll is much closer to 1.

Recall that the motivation for defining random variables was so we could get away from considering the probabilities of elementary events. The definition of expectation is not very helpful in this regard. Luckily, it is not too difficult to reformulate expectation in terms of the values that the random variable takes on.

Proposition 19.3. Let $\Pr(X = r) = \Pr(\{\omega \mid X(\omega) = r\})$ and let $\{r_1, \dots, r_k\} = \operatorname{range}(X)$. Then $$E(X) = \sum_{i=1}^k r_i \cdot \Pr(X = r_i).$$

Proof. Recall that the range of a function $X$ is all of the possible values that $X(\omega)$ can take over every $\omega \in \Omega$. Then we have \begin{align*} \sum_{i=1}^k r_i \cdot \Pr(X = r_i) &= \sum_{i=1}^k r_i \cdot \Pr(\{\omega \in \Omega \mid X(\omega) = r_i\}) \\ &= \sum_{i=1}^k r_i \cdot \sum_{\omega \in \Omega, X(\omega) = r_i} \Pr(\omega) \\ &= \sum_{i=1}^k \sum_{\omega \in \Omega, X(\omega) = r_i} X(\omega) \Pr(\omega) \\ &= \sum_{\omega \in \Omega} X(\omega) \Pr(\omega) \\ &= E(X) \end{align*} $\tag*{$\Box$}$

Sometimes, we will need to deal with more than one random variable. For instance, suppose we want to consider the expected value of a roll of three dice. We would like to be able to use the expected value for a single die roll, which we already worked out above. The following result gives us a way to do this.

Theorem 19.4 (Linearity of Expectation). Let $(\Omega, \Pr)$ be a probability space, $c_1, c_2, \dots, c_n$ be real numbers, and $X_1, X_2, \dots, X_n$ be random variables over $\Omega$. Then $$E\left( \sum_{i=1}^n c_i X_i \right) = \sum_{i=1}^n c_i E(X_i).$$

Proof. \begin{align*} E\left( \sum_{i=1}^n c_i X_i \right) &= \sum_{\omega \in \Omega} \left( \sum_{i=1}^n c_i X_i(\omega) \right) \Pr(\omega) \\ &= \sum_{\omega \in \Omega} \sum_{i=1}^n c_i X_i(\omega) \Pr(\omega) \\ &= \sum_{i=1}^n \sum_{\omega \in \Omega} c_i X_i(\omega) \Pr(\omega) \\ &= \sum_{i=1}^n c_i \left(\sum_{\omega \in \Omega} X_i(\omega) \Pr(\omega)\right) \\ &= \sum_{i=1}^n c_i E(X_i) \end{align*} $$\tag*{$\Box$}$$

Example 19.5. Suppose we have three of our biased dice from the previous example, each with $\Pr(1) = \frac 3 4$ and probability $\frac{1}{20}$ for the other outcomes. What is the expected value of a roll of these three dice? Without the linearity property, we would need to consider the expected value for every 3-tuple of rolls $(\omega_1, \omega_2, \omega_3)$. However, we can apply the previous theorem to get a more direct answer. Let $X_1, X_2, X_3$ be random variables for the roll of each die. Then we have $$E(X_1 + X_2 + X_3) = E(X_1) + E(X_2) + E(X_3) = 1.75 + 1.75 + 1.75 = 5.25.$$

Something you may have noticed is that from our previous discussions, in a roll of three dice, the roll of each die is independent from the others. In fact, we can define independence for random variables similarly.

Definition 19.6. We say the random variables $X$ and $Y$ are independent if for all $r,s \in \mathbb R$, $\Pr(X = r \wedge Y = s) = \Pr(X = r) \cdot \Pr(Y = s)$.

It is important to note that although we have a very similar notion of independence for random variables, none of our results so far have involved independence. That is, the linearity of expectation does not depend on the independence of the random variables involved.

Example 19.7. If you go to Japan, something you may experience is it starts raining and you duck inside the nearest convenience store to pick up an umbrella and then come back outside to find that the rain has stopped. You will probably have gotten one of those very cheap and ubiquitous 500 yen clear plastic umbrellas. Now, if you go inside to a restaurant or a shop, you will need to leave your umbrella at the front in a bin full of umbrellas. Since cheap umbrellas are so common, people just end up taking a random umbrella when they leave. Suppose $n$ people go inside a shop and leave their umbrellas at the front, and when they exit the shop, they randomly take an umbrella. What is the expected number of people who take the umbrella they had before entering the shop?

Let $X$ be the random variable that denotes the number of people who take their own umbrella when they leave. We can then define random variables $X_i$ corresponding to each person by $$X_i = \begin{cases} 1 & \text{if person $i$ takes their own umbrella when they leave,} \\ 0 & \text{otherwise.} \end{cases}$$ Then we have $X = X_1 + \cdots + X_n$. Since each person randomly chooses an umbrella, we have $E(X_i) = \frac 1 n$. Then we have $$E(X) = E(X_1 + \cdots + X_n) = E(X_1) + \cdots + E(X_n) = \overbrace{\frac 1 n + \cdots + \frac 1 n}^{\text{$n$ times}} = 1.$$

Example 19.8. Consider a random permutation $\pi$ of $\{1, 2, \dots, n\}$. A pair $(i,j)$ is an inversion in $\pi$ if $i \lt j$ and $j$ precedes, or appears before, $i$ in $\pi$. What is the expected number of inversions in $\pi$?

Let $I_{i,j}$ be the random variable with $I_{i,j} = 1$ if $i \lt j$ and $(i,j)$ is an inversion and $I_{i,j} = 0$ otherwise. Let $X$ be the random variable for the number of inversions in a permutation. Then $$X = \sum_{1 \leq i \lt j \leq n} I_{i,j}.$$ Then we have $$E(X) = \sum_{1 \leq i \lt j \leq n} E(I_{i,j}).$$ So, what is $E(I_{i,j})$? If we consider all permutations, then half of them will have $i \lt j$ and the other half will have $j \lt i$. This gives us $E(I_{i,j}) = \frac 1 2$. Then, we count the number of pairs $(i,j)$ with $i \lt j$ to get the number of terms in our summation. There are $\binom n 2$ such pairs, since we pick any pair of numbers and there's only one possible order we care about. This gives us $$E(X) = \sum_{1 \leq i \lt j \leq n} E(I_{i,j}) = \binom n 2 \frac 1 2 = \frac{n(n-1)}2 \frac 1 2 = \frac{n(n-1)}4.$$

This leads us to a nice average-case analysis of bubble sort. As you may recall, bubble sort involves going through an array and checking if each pair of adjacent elements is in the right order and swaps them if they're not. The worst-case analysis comes when we have an array that's totally in reverse, and we're forced to make $n$ passes through the array making $O(n)$ swaps, giving us a running time of $O(n^2)$.

Of course, a common criticism of worst-case analysis is that it's possible that typical cases aren't that bad. So for this case, we can ask ourselves, if we have a random array, how many swaps would we expect to make? This happens to be asking how many pairs of numbers we should expect to be out of order in a random permutation, which we've just figured out is $\frac{n(n-1)}4$. And so it turns out, in the average case, we're still looking at a running time of $O(n^2)$.

Another question we can ask about expectations and random variables is what the probability that a random variable takes on a value that is far from the expected value? Intuitively, this shouldn't happen very often. The following result gives us some idea of this probability by providing a bound on the probability that a random variable takes on a value of at least some constant $a$.

Theorem 19.9 (Markov's Inequality). If $X$ is nonnegative, then $\forall a \gt 0$, $$\Pr(X \geq a) \leq \frac{E(X)}{a}.$$

Proof. \begin{align*} E(X) &= \sum_{\omega \in \Omega} X(\omega) \Pr(\omega) \\ &\geq \sum_{\omega \in \Omega, X(\omega) \geq a} X(\omega) \Pr(\omega) \\ &\geq \sum_{\omega \in \Omega, X(\omega) \geq a} a \cdot \Pr(\omega) \\ &= a \cdot \sum_{\omega \in \Omega, X(\omega) \geq a} \Pr(\omega) \\ &= a \cdot \Pr(X \geq a). \end{align*}

This gives us $\Pr(X \geq a) \leq \frac{E(X)}{a}$. $$\tag*{$\Box$}$$

Markov's inequality is named for Andrey Markov, a Russian mathematician from the turn of the 20th century who is the same Markov responsible for things like Markov chains. Interestingly enough, this inequality appeared earlier in work by Pafnuty Chebyshev, who was Markov's doctoral advisor. However, Chebyshev has his own other inequality which we'll get to soon.