MPCS 50103 — Lecture 9

Sometimes, rather than picking a bunch of things from a fixed set, we may want to choose some objects from a set of types of things.

Suppose you've been tasked with gathering six donuts/muffins/scones/other pastry and there is a choice of four types, say chocolate, plain, strawberry, and green. How many different combinations can you make, with repetition? For an ordinary combination, we would only choose one of each type, but because we're concerned about classes of objects rather than single objects, we're able to choose multiples of a particular type.

Let's begin by considering one possible selection, $C,P,G,C,C,G$ (three chocolate, one plain, two green), assuming this is the order in which we chose our goods. However, since this is a combination and some of the elements are indistinguishable anyway, the order doesn't really matter, so let's group them together into $CCCPGG$. Now, let's separate these so they don't touch each other and cross contaminate the flavours or something, and we have something that looks like $CCC|P|GG$.

We can play with this analogy further and suppose that the box we have has a compartment for each type,regardless of the number that we end up choosing, so we have something like $CCC|P||GG$. Finally, we note that since each compartment contains a specific type, we don't need to specifically denote the type, and we can represent our choice by $***|*||**$.

Let's consider another possible choice: $*||*|****$, which is one chocolate, one strawberry, and four green. What we observe is that each choice of six items from four classes can be represented by an arrangement of six stars representing the items and three bars representing the division of classes of items.

But this is something we've already seen before: it's just a string problem over the alphabet $\{*,|\}$. Since we have six objects and four classes, we can view our possible selections as a string of length 9 with 6 $*$s and 3 $|$s and ask how many such strings there are. There are $$\binom 9 6 = \binom 9 3 = \frac{9!}{6!3!} = \frac{9 \cdot 8 \cdot 7}{3 \cdot 2 \cdot 1} = 3 \cdot 4 \cdot 7 = 84$$ such strings.

This method of using stars and bars to denote the objects and categories was popularized by William Feller's An Introduction to Probability Theory and its Applications in 1950.

There are $\binom{r+n-1} r = \binom{r+n-1}{n-1} = \frac{(r+n-1)!}{r!(n-1)!}$ $r$-combinations from a set $A$ of $n$ elements with repetition.

Proof. We can view each possible selection as a string of length $r+n-1$ over the alphabet $\{\star,|\}$. We know that each string contains exactly $r$ $\star$s and $n-1$ $|$s. Then there are $\binom{r+n-1}{r}$ possible ways to choose spots for the $r$ $\star$s. Since all remaining spots must be occupied by $|$s, this is the same as choosing spots for $n-1$ $|$s, and there are $\binom{r+n-1}{n-1}$ ways to do so. $\tag*{$\Box$}$

How many solutions to $x+y+z = 13$ are there, where $x,y,z$ are non-negative integers? Here, we can think of $x,y,z$ as our types of objects, of which we want to choose 13 in some combination. For instance, one solution would be to choose 6 $x$s, 4 $y$s, and 3 $z$s, which would give us $6+4+3 = 13$. Then the number of solutions is just $$\binom{13+3-1}{13} = \frac{15!}{13!2!} = \frac{15 \cdot 14}{2} = 105.$$

Just like with generalizing combinations, we can also think about how to generalize permutations.

How many distinguishable permutations of the word $ACGACGA$ are there? There are two approaches we can take. The first is something we've seen already, by counting the number of ways to choose positions in a word. Here, we have three $A$s, two $C$s, and two $G$s. So we can choose three of seven positions for the $A$s and there are $\binom 7 3$ ways to do so. This leaves four spots. We choose two for the $C$s and there are $\binom 4 2$ ways to do so, leaving two spots for the two $G$s with $\binom 2 2$ ways to place them. In total, we have $$\binom 7 3 \binom 4 2 \binom 2 2 = \frac{7!}{3!4!} \cdot \frac{4!}{2!2!} \cdot \frac{2!}{2!0!} = \frac{7!}{3!2!2!} = 210.$$

The other way to consider this problem is to suppose that each symbol of our word is distinguishable, by adding marks to each symbol in some way. So suppose we have $A_1 C_1 G_1 A_2 C_2 G_2 A_3$. It is clear that there are $7!$ permutations of this word. However, we know that these symbols aren't really distinguishable, so we treat each such permutation as equivalent. For instance, there are $P(3,3) = 3!$ permutations of $A_1 A_2 A_3$ and we know these are equivalent once we make these indistinguishable. By removing all the indistinguishable permutations for each symbol, we get $$\frac{7!}{3!2!2!}$$ as before.

The number of different permutatiosn of $n$ objects, where there are $n_i$ indistinguishable objects of type $i$ for $1 \leq i \leq k$ is $$\frac{n!}{n_1! n_2! \cdots n_k!}.$$

The Binomial Theorem

Let's think about binomials. This will seem like out of left field, but there is a connection, I promise. Recall that a binomial is an expression of two terms.

Hopefully, we are all familiar with how to expand a binomial. For instance, we have $$(x+y)^2 = x^2 + xy + xy + y^2 = x^2 + 2xy + y^2.$$ Of course, we can do the same thing with higher degrees, like \begin{align*} (x+y)^3 &= (x+y)^2 (x+y) \\ &= (x^2 + 2xy + y^2)(x+y) \\ &= x^3 + 2x^2y + xy^2 + x^2y + 2xy^2 + y^3 \\ &= x^3 + 3x^2y + 3xy^2 + y^3 \end{align*}

Now, this process is straightforward but a bit tedious. Perhaps there is an alternative way to think about the resulting coefficients. This becomes a bit more clear if we step away from our algebraic impulses a bit. We can write \begin{align*} (x+y)^4 &= (x+y)(x+y)(x+y)(x+y) \\ &= xxxx + xxxy + xxyx + xxyy + xyxx + xyxy + xyyx + xyyy + \\ &\quad\: yxxx + yxxy + yxyx + yxyy + yyxx + yyxy + yyyx + yyyy \end{align*} When written in this way, each term looks much more like the kinds of objects that we've been working with over the past week. That is, we can think of each term as a combination of the terms from each binomial. In other words, how many terms of the form $x^3y$ are there? Exactly as there are many ways to choose three $x$'s and one $y$ from each binomial. This leads us to the following result.

One can prove this via induction and it makes a nice exercise. We will prove this via a combinatorial argument similar to the above. Each term of our polynomial is of the form $x^{n-j}y^j$. Then the coefficients are exactly the number of ways to choose $n-j$ $x$'s from each binomial. There are $\binom n {n-j}$ ways to do this and $\binom n {n-j} = \binom n j$.

Suppose you have to prove the inductive case in some proof and you're faced with expanding $(k+1)^5$. Luckily, you don't need to futz around with expanding this polynomial and can directly apply the binomial theorem: \begin{align*} (k+1)^5 &= \binom 5 0 k^5 1^0 + \binom 5 1 k^4 1^1 + \binom 5 2 k^3 1^2 + \binom 5 3 k^2 1^3 + \binom 5 4 k 1^4 + \binom 5 5 k^0 1^5 \\ &= k^5 + 5k^4 + 10k^3 + 10k^2 + 5^k + 1 \end{align*}

This example suggests the following result, which follows from the Binomial Theorem by setting $y = 1$.

The following is a neat consequence of the binomial theorem that should match with our intution about what the binomial coefficients mean.

We have $$2^n = (1+1)^n = \sum_{i=0}^n \binom n i 1^{n-i} 1^i = \sum_{i=0} \binom n i.$$

Of course, this makes total sense: if $\binom n i$ is the number of subsets of $n$ elements of size $i$, then adding the number of these subsets together is the total number of subsets of a set of size $n$, which is $2^n$.

There are a lot of other neat results that fall out of the Binomial Theorem. Perhaps the most famous one is due to Blaise Pascal in the 1600s.

For all natural numbers $n \geq k \gt 0$, $$\binom{n+1}{k} = \binom{n}{k-1} + \binom n k.$$

As has been the case throughout this part of the course, there is a straightforward algebraic proof of this fact as well as a combinatorial proof. We will go through the combinatorial argument.

Let $A$ be a set of $n+1$ elements. We want to count the number of subsets of $A$ of size $k$, which is $\binom{n+1}{k}$. Choose an element $a \in A$ and a subset $B \subseteq A$ of size $k$. Then either $a \in B$ or $a \not \in B$; these two cases are disjoint. If $a \in B$, then $B$ contains $k-1$ elements in $A \setminus \{a\}$ and there are $\binom{n}{k-1}$ such sets. On the other hand, if $a \not \in B$, then $B$ contains $k$ elements in $A \setminus \{a\}$ and there are $\binom{n}{k}$ such sets. Together, this gives us $\binom{n}{k-1} + \binom n k$ subsets of $A$ of size $k$.

This identity leads us to the famous Pascal's Triangle: $$ \begin{matrix} &&&&&& \binom 0 0 &&&&&& \\ &&&&& \binom 1 0 && \binom 1 1 &&&&& \\ &&&& \binom 2 0 && \binom 2 1 && \binom 2 2 &&&& \\ &&& \binom 3 0 && \binom 3 1 && \binom 3 2 && \binom 3 3 &&& \\ && \binom 4 0 && \binom 4 1 && \binom 4 2 && \binom 4 3 && \binom 4 4 && \\ & \binom 5 0 && \binom 5 1 && \binom 5 2 && \binom 5 3 && \binom 5 4 && \binom 5 5 & \\ \binom 6 0 && \binom 6 1 && \binom 6 2 && \binom 6 3 && \binom 6 4 && \binom 6 5 && \binom 6 6 \end{matrix} $$ Filling in the values for the coefficients, the triangle looks like this: $$ \begin{matrix} &&&&&& 1 &&&&&& \\ &&&&& 1 && 1 &&&&& \\ &&&& 1 && 2 && 1 &&&& \\ &&& 1 && 3 && 3 && 1 &&& \\ && 1 && 4 && 6 && 4 && 1 && \\ & 1 && 5 && 10 && 10 && 5 && 1 & \\ 1 && 6 && 15 && 20 && 15 && 6 && 1 \end{matrix} $$ When viewed like this, it's easy to see how Pascal's identity is used to construct the following row.

This arrangement leads us to even more interesting identities or interpretations of identities with respect to the triangle.

Let $A$ be a set of $n+1$ elements. We will order the elements such that $a_1 \lt a_2 \lt \cdots \lt a_{n+1}$. Let's consider the subsets of $A$ of size $r+1$. Because we defined an order on the elements of $A$, each subsets $B$ of $r+1$ elements has a largest element, say $a_k$. Furthermore, $a_k$ has to be in the set $\{a_{r+1}, a_{r+2}, \dots, a_{n+1}\}$, since otherwise $B$ would contain fewer than $r+1$ elements.

This means that the set $B \setminus \{a_k\}$ of size $r$ is a subset of $\{a_1, a_2, \dots, a_{k-1}\}$. The total number of such sets is $\binom{k-1}{r}$. Then to get the total number of subsets of $A$ of size $r+1$, we sum over the possible $k$s (which, from above, range from $r+1$ to $n+1$) to get $$\sum_{k = r+1}^{n+1} \binom{k-1}{r} = \sum_{i = r}^n \binom i r$$ by setting $i = k-1$.

The Pigeonhole Principle

On the second floor of Crerar, just outside my office and the TA office hour rooms, you'll find the grad student mailboxes. These mail slots are also called pigeonholes. This came from earlier usage, when people really did keep birds in large arrangements of small compartments.

The pigeonhole principle was first stated by Dirichlet in the 1800s and the story goes that it was named as such because Dirichlet's father was a postmaster. However, Dirichlet's naming and statement was in French and German and is rendered as pigeonhole in English. We don't really call these things pigeonholes anymore, and it is probably just as well that we think of birds and holes since it illustrates the basic idea quite nicely: if you have more birds than holes, then one of those holes is going to contain more than one bird.

Let $A_1, A_2, \dots, A_k$ be disjoint sets such that $\left| \bigcup_{i=1}^k A_i \right| \gt k$. Then there exists a set $A_i$ such that $|A_i| \geq 2$.

Suppose we have $\left| \bigcup_{i=1}^k A_i \right| \gt k$. Let us assume that $|A_i| \leq 1$ for all $1 \leq i \leq k$. Then by the sum rule, we have $$\left| \bigcup_{i=1}^k A_i \right| = \sum_{i=1}^k |A_i| \leq \sum_{i=1}^k 1 = k,$$ which contradicts our initial hypothesis that $\left| \bigcup_{i=1}^k A_i \right| \gt k$.

Let $A$ and $B$ be finite sets with $|A| \gt |B|$ and let $f:A \to B$ be a function. Then $f$ is not one-to-one.

Let $B = \{b_1, b_2, \dots, b_n\}$. Consider the set $A_i = f^{-1}(b_i)$ for $1 \leq i \leq n$. Then there are $n$ sets $A_i$. By the pigeonhole principle, since there are more sets than there are elements of $A$, there must be one set $A_i$ such that $|A_i| \geq 2$. Therefore, $f$ is not one-to-one.

Suppose I'm a charlatan and I claim that I have a perfect lossless compression algorithm that I'd like to scam some sweet VC funding for. Recall that all data is really just a binary string. So, what I'm essentially claiming is that for every string $x \in \{0,1\}^*$, I can compress it into some other string $x'$ that's shorter and I can decompress it to recover $x$. Let's consider all the binary strings of length $n$. From our previous discussions, we know there are exactly $2^n$ such binary strings. How many strings of length $\lt n$ are there? There are $2^0 + 2^1 + \cdots 2^{n-1} = 2^n - 1$ such strings. But $2^n-1 \lt 2^n$, so there is at least one string of length $\lt n$ that corresponds to two strings of length $n$ after compression.

Let $A_1, A_2, \dots, A_k$ be disjoint sets such that $\left| \bigcup_{i=1}^k A_i \right| = n$. Then there exists a set $A_i$ such that $|A_i| \geq \left\lceil \frac n k \right\rceil$.

Suppose that $\left| \bigcup_{i=1}^k A_i \right| = n$. Let us assume that $|A_i| \lt \left\lceil \frac n k \right\rceil$ for all $1 \leq i \leq k$. Then we have $$\left| \bigcup_{i=1}^k A_i \right| = \sum_{i=1}^k |A_i| \lt \sum_{i=1}^k \frac n k = k \cdot \frac n k = n.$$ Here, we make use of the fact that if $|a_i| \lt \left\lceil \frac n k \right\rceil$, then $|a_i| \lt \frac n k$. Otherwise, since $|a_i|$ is an integer, we would contradict the definition of the ceiling function. In the end, the above argument shows that we contradict our initial hypothesis that $\left| \bigcup_{i=1}^k A_i \right| = n$.

Suppose you have large numbers of three colours of socks in a drawer and you're pulling socks out without looking. How many socks do you need to pull out before you're guaranteed to have a pair of the same colour? The answer is four: if we view each colour as a set, all we need is at least one more sock than there are sets.

Now, suppose you are a crab. Crabs have two claws and eight legs so they obviously need eight socks (this is different from octopuses, who have eight limbs, but it is not immediately clear if these should be considered arms or legs). How many socks do you need to pull out before you're guaranteed eight socks of the same colour? The pigeonhole principle says we need $n$ with $\left\lceil \frac n 3 \right\rceil = 8$, which suggests that $n \gt 21$. Approaching it from another perspective, in the worst case, our crab ends up with 7 socks of each colour, so it has 21 socks in total.

We can, of course, ask the same question of spiders, caterpillars, centipedes, and so on.

The following is a result from Ramsey Theory. Ramsey Theory is concerned with combinatorial questions that concern how large certain structures have to be before they are guaranteed to have some property. Ramsey was a British mathematician from the early 20th century who passed away very young, at the age of 26 in 1930.

Rosen states the following as a group where people are either friends or enemies. This is not really quite accurate, because just because you're not friends doesn't mean you're necessarily enemies (i.e. the Sid Meier's Civilization I model of diplomacy). The more standard analogy is about people at a party and whether they know each other or not.

Given any group of six people at a party, either three are mutual friends or three are mutual strangers. Consider Alice. Either Alice does or does not know each of the five others, which must mean that she knows at least three of the others or she doesn't know at least three of the others. We'll consider the case where she knows three other people, since the case where she doesn't know three other people is similar. So she knows Bob, Charlie, and Eve. We can consider two possbilities.

All of this can be stated more formally using graphs and cliques and colourings, but the party setting allows us to defer defining all of these things until we actually get to graph theory. But note that this argument hinges on the fact that there are at least five other people connected to Alice in order for us to apply the Pigeonhole principle. If we had fewer (i.e. if we took groups of five or less), we wouldn't be able to guarantee either mutual friend/stranger triplet.

We denote by $R(m,n)$ the Ramsey number, which is the minimum number of guests we need to be at our party such that at least $m$ people all know each other or at least $n$ people don't know each other. Our example above leads to the following theorem.

MPCS 50103 — Lecture 9

Permutations and Combinations with Repetition

The Binomial Theorem

The Pigeonhole Principle