CMSC 27100 — Lecture 17

Probability

Now that we have spent some time on learning how to enumerate outcomes, we can start to think about the likelihood of particular outcomes. This is where probability theory comes into play. In particular, we are interested in discrete probability, which deals with countably many outcomes and is attacked using many of the tools that we've been using. The other branch of probability theory deals with uncountably many outcomes and is approached using methods from calculus and analysis. For instance, where we can get away with summation, you will see a lot of integration when working in continuous probability theory.

At first glance, even though we're primarily concerned with discrete probability, it seems a bit strange that we'd try to stuff probability theory into this course which is already chock full of other things to cover. But probability and randomness play an important role in computer science and particularly in theoretical computer science. Probabilistic methods are an important part of analysis of algorithms and algorithm design. There are some problems for which we may not know of an efficient deterministic algorithm, but we can come up with a provably efficient randomized algorithm. To tackle these problems, we need a basic understanding of discrete probability.

Definition 17.1. A probability space $(\Omega,\Pr)$ is a finite set $\Omega \neq \emptyset$ together with a function $\Pr : \Omega \to \mathbb R^+$ such that

  1. $\forall \omega \in \Omega, \Pr(\omega) \gt 0$,
  2. $\sum_{\omega \in \Omega} \Pr(\omega) = 1$.

We say that $\Omega$ is the sample space and $\Pr$ is the probability distribution. The above two conditions are known as the probability axioms and is due to Andrey Kolmogorov in the 1930s. Kolmogorov was a Soviet mathematician and was particularly influential in computer science by developing notions of algorithmic information theory in the 1960s. It might be a bit surprising to learn (as I did) that the formalization of basic probability concepts didn't come about until the 20th century, despite there being analysis of games of chance dating back to 17th century France.

An event is a subset $A \subseteq \Omega$ and $\Pr : \mathcal P(\Omega) \to \mathbb R$ defined by $\Pr(A) = \sum_{\omega \in A} \Pr(\omega)$ is the probability of $A$. The atomic events are singleton sets $\{\omega\}$ with $\Pr(\{\omega\}) = \Pr(\omega)$ and the trivial events are events with probability 0 ($\emptyset$) or 1 ($\Omega$).

Example 17.2. A sample space for flipping a coin would be whether it lands on heads and tails $\{H,T\}$. A sample space for the outcome of a die roll would be $\{1,2,3,4,5,6\}$. An event can be something simple like flipping a coin and it landing on heads, in which case the event would be the set $\{H\}$. It can also be more complicated, like a die roll landing on a square, which would be the set $\{1,4\}$, or perhaps the positive divisors of 6, which would be the set $\{1,2,3,6\}$.

Definition 17.3. The uniform distribution over a sample space $\Omega$ defines $\Pr(\omega) = \frac 1 {|\Omega|}$ for all $\omega \in \Omega$. Thus, $\Pr(A) = \frac{|A|}{|\Omega|}$.

Example 17.4. Recall that a standard deck of playing cards consists of four suits, $\spadesuit, \heartsuit, \clubsuit, \diamondsuit$ with cards numbered from 2 through 10, and a Jack, Queen, King, and Ace. This gives us 52 cards in total ($4 \times 13$). What is the probability that a poker hand contains a full house (i.e. a three of a kind together with a pair)? First, let's consider what our sample space is. Since we're concerned with possible poker hands, this should be all of the 5-card hands from our 52 cards. In this case, the size of our sample space is $$|\Omega| = \binom{52}5 = \frac{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 2598960.$$ Now, let $F$ denote the event for full houses. First, we need to choose two denominations. These will not be the same, since otherwise, we'd need five of the same denomination. We also need to distinguish between the denomination for the three of a kind and the denomination for the pair, so this is a 2-permutation and not a combination, which gives us $P(13,2)$. From this selection, we have $\binom 4 3$ ways to choose a three of a kind for one denomination and $\binom 4 2$ ways to choose a pair. Putting this all together, we get $$|F| = P(13,2) \binom 4 3 \binom 4 2 = (13 \cdot 12) \cdot \frac{4 \cdot 3 \cdot 2}{3 \cdot 2 \cdot 1} \cdot \frac{4 \cdot 3}{2 \cdot 1} = 13 \cdot 12 \cdot 4 \cdot 6 = 3744.$$ Then the probability that we get a full house is $$\Pr(F) = \frac{|F|}{|\Omega|} = \frac{3744}{2598960} \approx 0.00144.$$

So determining the probability of a single event amounts to counting the possible outcomes, or the sizes of the sets for the event and the sample space. And since we are dealing with sets, we can combine our sets in the same way as we've been doing to consider the probability of multiple events together.

Theorem 17.5. $\Pr(A \cup B) + \Pr(A \cap B) = \Pr(A) + \Pr(B)$.

You might notice that this is really just a rearrangement of the Inclusion-Exclusion Principle, with the intersection term moved to the other side of the equality. This leads us to the familiar notion of disjoint events.

Definition 17.6. Events $A$ and $B$ are disjoint if $A \cap B = \emptyset$.

Theorem 17.7 (Boole's Inequality). $\Pr \left( \bigcup_{i=1}^k A_i \right) \leq \sum_{i=1}^k \Pr(A_i)$. Furthermore, this is an equality if and only if the $A_i$ are pairwise disjoint.

This is just a generalization of Theorem 17.4 and the proof comes straight from applying it. Alternatively, you can use induction to prove this. If you consider the events to be disjoint, this is really pretty much the sum rule. As you might have noticed, this result is due to George Boole, who is the same Boole that gave us Boolean algebra.

Something that doesn't quite have a direct analogue from our discussions on set theory is the notion of conditional events.

Definition 17.8. If $A$ and $B$ are events, then the conditional probability of $A$ relative to $B$ is $$\Pr(A \mid B) = \frac{\Pr(A \cap B)}{\Pr(B)}.$$ Note that by this definition, $\Pr(A \cap B) = \Pr(A \mid B) \cdot \Pr(B)$.

One way of looking at this is if we restrict our attention to the event $B$ and ask, what is the probability of $A$ in this space? In this case, we are asking for the intersection of the events $A$ and $B$. Then to "zoom out" back to the sample space $\Omega$, we divide by the probability of $B$.