MPCS 50103 — Lecture 10

Probability

Now that we have spent some time on learning how to enumerate outcomes, we can start to think about the likelihood of particular outcomes. This is where probability theory comes into play. In particular, we are interested in discrete probability, which deals with countably many outcomes and is attacked using many of the tools that we've been using. The other branch of probability theory deals with uncountably many outcomes and is approached using methods from calculus and analysis. For instance, where we can get away with summation, you will see a lot of integration when working in continuous probability theory.

At first glance, even though we're primarily concerned with discrete probability, it seems a bit strange that we'd try to stuff probability theory into this course which is already chock full of other things to cover. But probability and randomness play an important role in computer science and particularly in theoretical computer science. Probabilistic methods are an important part of analysis of algorithms and algorithm design. There are some problems for which we may not know of an efficient deterministic algorithm, but we can come up with a provably efficient randomized algorithm. To tackle these problems, we need a basic understanding of discrete probability.

A probability space $(\Omega,\Pr)$ is a finite set $\Omega \neq \emptyset$ together with a function $\Pr : \Omega \to \mathbb R^+$ such that

  1. $\forall \omega \in \Omega, \Pr(\omega) \gt 0$,
  2. $\sum_{\omega \in \Omega} \Pr(\omega) = 1$.

We say that $\Omega$ is the sample space and $\Pr$ is the probability distribution. The above two conditions are known as the probability axioms and is due to Andrey Kolmogorov in the 1930s. Kolmogorov was a Soviet mathematician and was particularly influential in computer science by developing notions of algorithmic information theory in the 1960s. It might be a bit surprising to learn (as I did) that the formalization of basic probability concepts didn't come about until the 20th century, despite there being analysis of games of chance dating back to 17th century France.

An event is a subset $A \subseteq \Omega$ and $\Pr : \mathcal P(\Omega) \to \mathbb R$ defined by $\Pr(A) = \sum_{\omega \in A} \Pr(\omega)$ is the probability of $A$. The atomic events are singleton sets $\{\omega\}$ with $\Pr(\{\omega\}) = \Pr(\omega)$ and the trivial events are events with probability 0 ($\emptyset$) or 1 ($\Omega$).

A sample space for flipping a coin would be whether it lands on heads and tails $\{H,T\}$. A sample space for the outcome of a die roll would be $\{1,2,3,4,5,6\}$. An event can be something simple like flipping a coin and it landing on heads, in which case the event would be the set $\{H\}$. It can also be more complicated, like a die roll landing on a square, which would be the set $\{1,4\}$, or perhaps the positive divisors of 6, which would be the set $\{1,2,3,6\}$.

The uniform distribution over a sample space $\Omega$ defines $\Pr(\omega) = \frac 1 {|\Omega|}$ for all $\omega \in \Omega$. Thus, $\Pr(A) = \frac{|A|}{|\Omega|}$.

Recall that a standard deck of playing cards consists of four suits, $\spadesuit, \heartsuit, \clubsuit, \diamondsuit$ with cards numbered from 2 through 10, and a Jack, Queen, King, and Ace. This gives us 52 cards in total ($4 \times 13$). What is the probability that a poker hand contains a full house (i.e. a three of a kind together with a pair)? First, let's consider what our sample space is. Since we're concerned with possible poker hands, this should be all of the 5-card hands from our 52 cards. In this case, the size of our sample space is $$|\Omega| = \binom{52}5 = \frac{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 2598960.$$ Now, let $F$ denote the event for full houses. First, we need to choose two denominations. These will not be the same, since otherwise, we'd need five of the same denomination. We also need to distinguish between the denomination for the three of a kind and the denomination for the pair, so this is a 2-permutation and not a combination, which gives us $P(13,2)$. From this selection, we have $\binom 4 3$ ways to choose a three of a kind for one denomination and $\binom 4 2$ ways to choose a pair. Putting this all together, we get $$|F| = P(13,2) \binom 4 3 \binom 4 2 = (13 \cdot 12) \cdot \frac{4 \cdot 3 \cdot 2}{3 \cdot 2 \cdot 1} \cdot \frac{4 \cdot 3}{2 \cdot 1} = 13 \cdot 12 \cdot 4 \cdot 6 = 3744.$$ Then the probability that we get a full house is $$\Pr(F) = \frac{|F|}{|\Omega|} = \frac{3744}{2598960} \approx 0.00144.$$

So determining the probability of a single event amounts to counting the possible outcomes, or the sizes of the sets for the event and the sample space. And since we are dealing with sets, we can combine our sets in the same way as we've been doing to consider the probability of multiple events together.

$\Pr(A \cup B) + \Pr(A \cap B) = \Pr(A) + \Pr(B)$.

You might notice that this is really just a rearrangement of the Inclusion-Exclusion Principle, with the intersection term moved to the other side of the equality. This leads us to the familiar notion of disjoint events.

Events $A$ and $B$ are disjoint if $A \cap B = \emptyset$.

$\Pr \left( \bigcup_{i=1}^k A_i \right) \leq \sum_{i=1}^k \Pr(A_i)$. Furthermore, this is an equality if and only if the $A_i$ are pairwise disjoint.

This is just a generalization of Theorem 10.6 and the proof comes straight from applying it. Alternatively, you can use induction to prove this. If you consider the events to be disjoint, this is really pretty much the sum rule. As you might have noticed, this result is due to George Boole, who is the same Boole that gave us Boolean algebra.

Conditional probability

Something that doesn't quite have a direct analogue from our discussions on set theory is the notion of conditional events.

If $A$ and $B$ are events, then the conditional probability of $A$ relative to $B$ is $$\Pr(A \mid B) = \frac{\Pr(A \cap B)}{\Pr(B)}.$$ Note that by this definition, $\Pr(A \cap B) = \Pr(A \mid B) \cdot \Pr(B)$.

One way of looking at this is if we restrict our attention to the event $B$ and ask, what is the probability of $A$ in this space? In this case, we are asking for the intersection of the events $A$ and $B$. Then to "zoom out" back to the sample space $\Omega$, we divide by the probability of $B$.

Consider a roll of three dice.

  1. What is the probability that the sum of the dice is 9?
  2. What is the probability that the first die shows 5?
  3. What is the probability that the first die shows 5, given that the sum of the dice is 9?

Here, if we want to talk about the "first" die, we need our dice to be distinguishable. This means that our sample space is an arrangement of the outcome of the three dice after a roll. Each of the three dice can take on a value from 1 through 6, giving us $6^3 = 216$ possible outcomes.

  1. What we want here is the number of solutions $x_1 + x_2 + x_3 = 9$, with $1 \leq x_1, x_2, x_3 \leq 6$. This turns out to be very similar to a problem on this week's problem set, so I'll just say that the number of such outcomes is $\binom 8 2 - 3 = 25$. Then the probability is $\frac{25}{216} \approx 0.0116$.
  2. Here, we fix one of the die, leaving the other two free. This is exactly $1 \cdot 6 \cdot 6$, so the probability of this event is $\frac{6^2}{6^3} = \frac 1 6$.
  3. This is a conditional probability question. For convenience, let $A$ be the event "the sum of the dice is 9" and let $B$ be the event "the first die is 5". We already know that there are 25 instances where the dice sum up to 9. What's left is to compute the number of outcomes where the dice add up to 9 and the first die is 5. Fixing the first die to 5, we have that the other two dice must add up to 4, giving us the following outcomes: $(5,1,3),(5,2,2),(5,1,3)$. Then the probability is computed by the following: $$\Pr(B \mid A) = \frac{\Pr(A \cap B)}{\Pr(A)} = \frac{|A \cap B|}{|A|} = \frac{3}{25}.$$

Suppose I know $\Pr(A \mid B)$ and I know $\Pr(A \mid \overline B)$. Then that should just give me $\Pr(A)$ when I put them together, along with $\Pr(B)$. The Law of Total Probability generalizes this idea.

A partition of $\Omega$ is a family of pairwise disjoint events $H_1, \dots, H_m$ covering $\Omega$. That is, $\bigcup_{i=1}^m H_i = \Omega$ and $H_i \cap H_j = \emptyset$ for all $i \neq j$.

Given a partition $H_1, \dots, H_m$ of $\Omega$ and an event $A$, $$\Pr(A) = \sum_{i=1}^m \Pr(A \mid H_i) \cdot \Pr(H_i).$$

Since we have a partition $H_1, \dots, H_m$, we have $A = A \cap \Omega = A \cap \bigcup_{i = 1}^m H_i$. Then, \begin{align*} \Pr(A) &= \Pr\left( A \cap \bigcup_{i=1}^m H_i \right) \\ &= \Pr \left( \bigcup_{i=1}^m (A \cap H_i) \right) \\ &= \sum_{i=1}^m \Pr(A \cap H_i) &\text{since $H_i$ are pairwise disjoint} \\ &= \sum_{i=1}^m \Pr(A \mid H_i) \Pr(H_i) \end{align*}

 

Suppose that two delivery services $S$ and $T$ hold a duopoly and perform all deliveries in the same city. If $S$ has an on-time rate of 94% and carries out 70% of deliveries, while $T$ has an on-time rate of 98% and carries out 30% of deliveries, what is the overall on-time rate for deliveries in the city?

First, we need to determine our events.

Based on our information above, we know the on-time rates for each delivery company.

Then the Law of Total Probability gives us the following: \begin{align*} \Pr(O) &= \Pr(O \mid D_S) \cdot \Pr(D_S) + \Pr(O \mid D_T) \cdot \Pr(D_T) \\ &= \frac{94}{100} \cdot \frac{7}{10} + \frac{98}{100} \cdot \frac{3}{10} \\ &= 95.2\% \end{align*}