CMSC 27100 — Lecture 16

The Binomial Theorem

Let's think about binomials. This will seem like out of left field, but there is a connection, I promise. Recall that a binomial is an expression of two terms.

Hopefully, we are all familiar with how to expand a binomial. For instance, we have $$(x+y)^2 = x^2 + xy + xy + y^2 = x^2 + 2xy + y^2.$$ Of course, we can do the same thing with higher degrees, like \begin{align*} (x+y)^3 &= (x+y)^2 (x+y) \\ &= (x^2 + 2xy + y^2)(x+y) \\ &= x^3 + 2x^2y + xy^2 + x^2y + 2xy^2 + y^3 \\ &= x^3 + 3x^2y + 3xy^2 + y^3 \end{align*}

Now, this process is straightforward but a bit tedious. Perhaps there is an alternative way to think about the resulting coefficients. This becomes a bit more clear if we step away from our algebraic impulses a bit. We can write \begin{align*} (x+y)^4 &= (x+y)(x+y)(x+y)(x+y) \\ &= xxxx + xxxy + xxyx + xxyy + xyxx + xyxy + xyyx + xyyy + \\ &\quad\: yxxx + yxxy + yxyx + yxyy + yyxx + yyxy + yyyx + yyyy \end{align*} When written in this way, each term looks much more like the kinds of objects that we've been working with over the past week. That is, we can think of each term as a combination of the terms from each binomial. In other words, how many terms of the form $x^3y$ are there? Exactly as there are many ways to choose three $x$'s and one $y$ from each binomial. This leads us to the following result.

For all $n \in \mathbb N$, $$(x+y)^n = \sum^n_{j=0} \binom n j x^{n-j} y^j.$$

One can prove this via induction and it makes a nice exercise. We will prove this via a combinatorial argument similar to the above. Each term of our polynomial is of the form $x^{n-j}y^j$. Then the coefficients are exactly the number of ways to choose $n-j$ $x$'s from each binomial. There are $\binom n {n-j}$ ways to do this and $\binom n {n-j} = \binom n j$.

An interesting consequence of this theorem is how it gives us a connection between combinatorics and algebra. Being able to shift problems from one area of math to another is a powerful move because it allows us to use tools in different area of mathematics. In a class with more time to spend on combinatorics, we would have seen that this idea leads to generating functions, which are encodings of sequences as power series.

Suppose you have to prove the inductive case in some proof and you're faced with expanding $(k+1)^5$. Luckily, you don't need to futz around with expanding this polynomial and can directly apply the binomial theorem: \begin{align*} (k+1)^5 &= \binom 5 0 k^5 1^0 + \binom 5 1 k^4 1^1 + \binom 5 2 k^3 1^2 + \binom 5 3 k^2 1^3 + \binom 5 4 k 1^4 + \binom 5 5 k^0 1^5 \\ &= k^5 + 5k^4 + 10k^3 + 10k^2 + 5^k + 1 \end{align*}

This example suggests the following result, which follows from the Binomial Theorem by setting $y = 1$.

For all $n \in \mathbb N$, $$(x+1)^n = \sum_{i=0}^n \binom n i x^i.$$

The following is a neat consequence of the binomial theorem that should match with our intution about what the binomial coefficients mean.

For all $n \in \mathbb N$, $$\sum_{i=0}^n \binom n i = 2^n.$$

We have $$2^n = (1+1)^n = \sum_{i=0}^n \binom n i 1^{n-i} 1^i = \sum_{i=0} \binom n i.$$

 

Of course, this makes total sense: if $\binom n i$ is the number of subsets of $n$ elements of size $i$, then adding the number of these subsets together is the total number of subsets of a set of size $n$, which is $2^n$.

There are a lot of other neat results that fall out of the Binomial Theorem. Perhaps the most famous one is due to Blaise Pascal in the 1600s.

For all natural numbers $n \geq k \gt 0$, $$\binom{n+1}{k} = \binom{n}{k-1} + \binom n k.$$

As has been the case throughout this part of the course, there is a straightforward algebraic proof of this fact as well as a combinatorial proof. We will go through the combinatorial argument.

Let $A$ be a set of $n+1$ elements. We want to count the number of subsets of $A$ of size $k$, which is $\binom{n+1}{k}$. Choose an element $a \in A$ and a subset $B \subseteq A$ of size $k$. Then either $a \in B$ or $a \not \in B$; these two cases are disjoint. If $a \in B$, then $B$ contains $k-1$ elements in $A \setminus \{a\}$ and there are $\binom{n}{k-1}$ such sets. On the other hand, if $a \not \in B$, then $B$ contains $k$ elements in $A \setminus \{a\}$ and there are $\binom{n}{k}$ such sets. Together, this gives us $\binom{n}{k-1} + \binom n k$ subsets of $A$ of size $k$.

This identity leads us to the famous Pascal's Triangle: $$ \begin{matrix} &&&&&& \binom 0 0 &&&&&& \\ &&&&& \binom 1 0 && \binom 1 1 &&&&& \\ &&&& \binom 2 0 && \binom 2 1 && \binom 2 2 &&&& \\ &&& \binom 3 0 && \binom 3 1 && \binom 3 2 && \binom 3 3 &&& \\ && \binom 4 0 && \binom 4 1 && \binom 4 2 && \binom 4 3 && \binom 4 4 && \\ & \binom 5 0 && \binom 5 1 && \binom 5 2 && \binom 5 3 && \binom 5 4 && \binom 5 5 & \\ \binom 6 0 && \binom 6 1 && \binom 6 2 && \binom 6 3 && \binom 6 4 && \binom 6 5 && \binom 6 6 \end{matrix} $$ Filling in the values for the coefficients, the triangle looks like this: $$ \begin{matrix} &&&&&& 1 &&&&&& \\ &&&&& 1 && 1 &&&&& \\ &&&& 1 && 2 && 1 &&&& \\ &&& 1 && 3 && 3 && 1 &&& \\ && 1 && 4 && 6 && 4 && 1 && \\ & 1 && 5 && 10 && 10 && 5 && 1 & \\ 1 && 6 && 15 && 20 && 15 && 6 && 1 \end{matrix} $$ When viewed like this, it's easy to see how Pascal's identity is used to construct the following row.

You might wonder whether we can generalize the idea of binomial coefficients to trinomials and other larger nomials. The answer is: yes! These are called multinomial coefficients and are denoted $\binom{n}{n_1,n_2, \dots, n_k}$. In particular, $$\binom{n}{n_1, n_2, \dots, n_k} = \frac{n!}{n_1! n_2! \cdots n_k!}$$ is the coefficient for the term $(x_1^{n_1} x_2^{n_2} \cdots x_k^{n_k})$ in $(x_1 + x_2 + \cdots + x_k)^n$. Just like with the Binomial Theorem, one can prove that this is true combinatorially—notice a suspicious similarity in this definition.

The Pigeonhole Principle

On the second floor of Crerar, just outside the TA office hour rooms, you'll find the grad student mailboxes. These mail slots are also called pigeonholes. This came from earlier usage, when people really did keep birds in large arrangements of small compartments.

The pigeonhole principle was first stated by Dirichlet in the 1800s and the story goes that it was named as such because Dirichlet's father was a postmaster. However, Dirichlet's naming and statement was in French and German and is rendered as pigeonhole in English. We don't really call these things pigeonholes anymore, and it is probably just as well that we think of birds and holes since it illustrates the basic idea quite nicely: if you have more birds than holes, then one of those holes is going to contain more than one bird.

Let $A_1, A_2, \dots, A_k$ be disjoint sets such that $\left| \bigcup\limits_{i=1}^k A_i \right| \gt k$. Then there exists a set $A_i$ such that $|A_i| \geq 2$.

Suppose we have $\left| \bigcup\limits_{i=1}^k A_i \right| \gt k$. Let us assume that $|A_i| \leq 1$ for all $1 \leq i \leq k$. Then by the sum rule, we have $$\left| \bigcup_{i=1}^k A_i \right| = \sum_{i=1}^k |A_i| \leq \sum_{i=1}^k 1 = k,$$ which contradicts our initial hypothesis that $\left| \bigcup\limits_{i=1}^k A_i \right| \gt k$.

This idea seems very obvious, but it leads to some useful results.

Let $A$ and $B$ be finite sets with $|A| \gt |B|$ and let $f:A \to B$ be a function. Then $f$ is not one-to-one.

Let $B = \{b_1, b_2, \dots, b_n\}$. Consider the set $A_i = f^{-1}(b_i)$ for $1 \leq i \leq n$. Then there are $n$ sets $A_i$—one for each element of $B$. By the pigeonhole principle, since there are more elements of $A$ than there are elements of $B$, and there are as many sets $A_i$ as there are elements of $B$, there must be one set $A_i$ such that $|A_i| \geq 2$. Therefore, $f$ is not one-to-one.

Suppose I'm a charlatan and I claim that I have a perfect lossless compression algorithm that I'd like to scam some sweet VC funding for. Recall that all data is really just a binary string. So, what I'm essentially claiming is that for every string $x \in \{0,1\}^*$, I can compress it into some other string $x'$ that's shorter and I can decompress it to recover $x$. Let's consider all the binary strings of length $n$. From our previous discussions, we know there are exactly $2^n$ such binary strings. How many strings of length $\lt n$ are there? There are $2^0 + 2^1 + \cdots 2^{n-1} = 2^n - 1$ such strings. But $2^n-1 \lt 2^n$, so there is at least one string of length $\lt n$ that corresponds to two strings of length $n$ after compression.