MPCS 50103 — Lecture 4

Prime numbers

Recall that a prime number is defined as the following.

Definition 4.8. An integer $p$ greater than 1 is called prime if the only positive divisors of $p$ are 1 and $p$. Otherwise, $p$ is called composite.

Now, let's prove something about primes. Recall that our objective is to prove the following.

Theorem 6.1 (Fundamental Theorem of Arithmetic). Every natural number $n \gt 1$ can be written uniquely as a product of primes.

First, we need the following very useful theorem, a result from Euclid's Elements.

Theorem 6.2 (Euclid's lemma). If $p$ is prime, and $a,b$ are integers such that $p \mid a \cdot b$, then $p \mid a$ or $p \mid b$.

Proof. Without loss of generality, suppose $p \not \mid a$. Then $\gcd(p,a) = 1$ and there exist integers $x$ and $y$ such that $xa + yp = 1$. Then multiplying both sides by $b$, we have $$b = xab + ypb.$$ Since $p \mid ab$, let $n$ be an integer such that $ab = pn$. Then we have $$b = xab + ypb = xpn + ypb = p \cdot (xn+yb).$$ Therefore, by definition of divisibility, $p \mid b$. $$\tag*{$\Box$}$$

Notice that this proof makes use of Bézout's lemma. Since Bézout's lemma appears about 2000 years after Euclid, it is safe to say that this is not the proof that appears in the Elements.

Now, we will prove the following lemma by using induction, which we make use of later on.

Lemma 6.3. If $p, p_1, \dots, p_k$ are prime and $p \mid p_1 \cdots p_k$, then $p = p_i$ for some $i$.

Proof. We will prove this by induction on $k$, the number of primes.

Base case. In this case, $p \mid p_1$. Since $p$ and $p_1$ are both prime, it must be the case that $p = p_1$.

Inductive case. Suppose our lemma is true for $k$ and consider $p \mid p_1 \cdots p_k \cdot p_{k+1}$. By Theorem 7.5, if $p \mid a \cdot b$, then $p \mid a$ or $p \mid b$. Let $a = p_1 \cdots p_k$ and $p_{k+1} = b$. If $p \mid a$, then $p = p_i$ for some $i$ with $1 \leq i \leq k$ by our inductive hypothesis and our result holds. If $p \mid b$, then $p = p_{k+1}$ and our result holds. $$\tag*{$\Box$}$$

The Fundamental Theorem of Arithmetic

Now, we're almost ready to prove the Fundamental Theorem of Arithmetic. Since the theorem claims existence and uniqueness of prime factorization, we'll prove this in two parts: first we prove the fact that we can factor numbers into primes and then show that this factorization is unique.

Theorem 6.4. Every natural number $n \gt 1$ can be written as a product of primes.

Proof. We will prove that for $n \gt 1$, $n$ can be written as a product of primes via strong induction on $n$.

Base case. $n = 2$. Clearly, 2 is a product of primes because 2 itself is a prime number.

Inductive case. Assume that for $2 \leq m \lt k$, $m$ can be written as a product of primes. We want to show that $k$ can be written as a product of primes. There are two cases to consider.

If $k$ is prime, then $k$ can clearly be written as a product of primes since $k$ is itself a prime number.
If $k$ is composite, then we can write $k = a \cdot b$ for some $a, b \in \mathbb N$. Observe that $2 \leq a, b \lt k$. Therefore, by the inductive hypothesis, $a$ and $b$ can be written as a product of primes. Let $a = p_1 p_2 \cdots p_s$ and $b = q_1 q_2 \cdots q_t$, where $p_1, p_2, \dots, p_s, q_1, q_2, \dots, q_t$ are primes. Then $$k = a \cdot b = p_1 p_2 \cdots p_s q_1 q_2 \cdots q_t$$ and therefore $k$ can be written as a product of primes.

$\tag*{$\Box$}$

Now, we will finish the proof of Theorem 6.1 by showing that prime factorizations are unique.

Proof of Theorem 6.1. By Theorem 6.4, every natural number $n \gt 1$ has a prime factorization. Suppose that prime factorizations are not unique. Let $n$ be the smallest number that doesn't have a unique prime factorization, so $$n = p_1 \cdots p_k = q_1 \cdots q_\ell,$$ where $p_i$ is prime for $1 \leq i \leq k$ and $q_j$ is prime for $1 \leq j \leq \ell$. Clearly, $p_1 \mid n$. Then by Lemma 6.3, $p_1 = q_j$ for some $j$. Now, we divide each factorization by $p_1$. This gives us $$\frac n {p_1} = p_2 p_3 \cdots p_k = q_1 q_1 \cdots q_{j-1} q_{j+1} \cdots q_\ell.$$ But this is a smaller number with more than one prime factorization, contradicting the minimality of $n$. $\tag*{$\Box$}$

The following result is another of Euclid's from the Elements.

Theorem 6.5 (Euclid's Theorem). There exist infinitely many primes.

Proof. Assume for a contradiction that there are only finitely many primes $p_1, p_2, \dots, p_k$. Consider $n = p_1 \cdot p_2 \cdots p_k + 1$. Since $n \gt p_i$ for all $i$, it cannot be a prime. Observe that for each prime $p_i$, we can write $n = q \cdot p_i + 1$ for some integer $q$.

By the division theorem, this means that we have a remainder of 1 when dividing $n$ by any of the primes $p_i$. Since $1 \neq 0$ and $1 \lt p_i$, it must be the case that $p_i$ does not divide $n$ for all $i$. In other words, $n$ cannot be factored into primes. But this is a contradiction of the Fundamental Theorem of Arithmetic. Therefore, our assumption that there were only finitely many primes was incorrect and there must exist infinitely many primes. $\tag*{$\Box$}$

Fermat's Little Theorem

Recall that if we consider the structure $\mathbb Z_p$ for prime $p$, we are guaranteed that every element in $\mathbb Z_p$ has a multiplicative inverse. We will work towards a result of Pierre de Fermat's about prime numbers and $\mathbb Z_p$.

First, the following result is called Wilson's Theorem, because it was stated by John Wilson in the 1700s. It turns out that it was stated around 700 years prior by an Arab mathematician, Hasan Ibn al-Haytham.

Theorem 6.6 (Wilson's Theorem). For all primes $p$, $(p-1)! = -1 \pmod p$.

Proof. Consider the equation $x^2 = 1 \pmod p$. We can rewrite this as $x^2 - 1 = 0 \pmod p$ and factor it to get $(x+1)(x-1) = 0 \pmod p$. Then $p \mid (p+1)(p-1)$ and therefore, $p \mid (p+1)$ or $p \mid (p-1)$. Thus, $x+1 = 0 \pmod p$ or $x-1 = 0 \pmod p$ and therefore $x = \pm 1 \pmod p$.

This means that for $2 \leq x \leq p-2$, $x \neq x^{-1} \pmod p$. Since each element has a multiplicative inverse and the inverses are unique, each of $2, 3, \dots, p-2$ can be paired with its inverse to cancel out to 1. Thus, we have $$1 \cdot 2 \cdot \cdots \cdot p-2 \cdot p-1 = 1 \cdot p-1 = 1 \cdot -1 = -1 \pmod p.$$ $\tag*{$\Box$}$

The following theorem is due to Pierre de Fermat, which is called Fermat's Little Theorem, to distinguish it from the more famous Fermat theorem, Fermat's Last Theorem (there are no integers $a,b,c$ such that $a^n + b^n = c^n$ for $n \gt 2$). Much like Fermat's Last Theorem, Fermat stated the result in the mid 1600s but declined to give a proof of it because the proof was "too long".

While it would be about 400 years before Fermat's Last Theorem gets proven (by Andrew Wiles in 1995), Fermat's Little Theorem was proved only about 100 years later, by Leonhard Euler. The following proof, using modular arithmetic, is due to James Ivory in the early 1800s and a bit later by Dirichlet.

Theorem 6.7 (Fermat's Little Theorem). For all prime numbers $p$ and integers $a$ such that $\gcd(a,p) = 1$, $a^{p-1} \equiv 1 \pmod p$.

Proof. Consider $n = (a \cdot 1) \cdot (a \cdot 2) \cdot \cdots \cdot (a \cdot (p-1)) \pmod p$. We can view this product in two ways.

First, we claim that $$\{[a]_p, [2a]_p, \dots, [(p-1)a]_p\} = \{[1]_p,[2]_p,\dots,[p-1]_p\}.$$ There are two things we must verify. First, we must verify that there is no $i$ with $1 \leq i \leq p-1$ such that $a \cdot i = 0 \pmod p$. Secondly, we must verify that for $1 \leq i \lt j \leq p-1$, $a \cdot i \neq a \cdot j$.

Suppose that there is $i$ such that $a \cdot i = 0 \pmod p$ and $1 \leq i \leq p-1$ is in the set above. Since $\gcd(a,p) = 1$, there is a multiplicative inverse for $a$ in $\mathbb Z_p$. Then we have \begin{align*} a \cdot i &= 0 &\pmod p \\ a^{-1} \cdot (a \cdot i) &= a^{-1} \cdot 0 &\pmod p \\ 1 \cdot i &= 1 \cdot 0 &\pmod p \\ i &= 0 &\pmod p \end{align*} contradicting $1 \leq i \leq p-1$.

Next, we verify that the elements in the set are distinct and suppose that there are two elements $a \cdot i = a \cdot j \pmod p$ with $1 \leq i \lt j \leq p-1$. Then we have \begin{align*} a \cdot i &= a \cdot j &\pmod p \\ a^{-1} \cdot (a \cdot i) &= a^{-1} \cdot (a \cdot j) &\pmod p \\ 1 \cdot i &= 1 \cdot j &\pmod p \\ i &= j &\pmod p \end{align*} which contradicts $i \neq j$. Therefore, we can conclude $$n = (a \cdot 1) \cdot (a \cdot 2) \cdot \cdots \cdot (a \cdot (p-1)) = (p-1)! = -1 \pmod p$$ by Wilson's Theorem.

Next, we can rearrange the terms of the product to give us $n = a^{p-1} \cdot (p-1)! \pmod p$. By Wilson's Theorem, we have $n = a^{p-1} \cdot (-1) \pmod p$.

Together, we get that $n = a^{p-1} \cdot (-1) = -1 \pmod p$. Therefore, $a^{p-1} = 1 \pmod p$. $$\tag*{$\Box$}$$

Example 6.8. Let's compute $7^{315} \bmod 11$. Instead of performing division on $7^{315}$, we manipulate the exponent. Observing that $10 = 11-1$, we see that $315 = 10 \cdot 31 + 5$ by division. This gives us \begin{align*} 7^{315} &= 7^{31 \cdot 10 + 5} &\pmod{11} \\ &= (7^{10})^{31} \cdot 7^5 &\pmod{11} \\ &= 1^{31} \cdot 49^2 \cdot 7 &\pmod{11} \\ &= 5^2 \cdot 7 &\pmod{11} \\ &= 25 \cdot 7 &\pmod{11} \\ &= 3 \cdot 7 &\pmod{11} \\ &= 21 &\pmod{11} \\ &= 10 &\pmod{11} \\ \end{align*}

Cryptography

Cryptography is the study of how to transform information and communication so that it's concealed and secure. The most basic form of cryptography dates back to the classical era, via the use of substitution ciphers. These involve shifts of letters: if your cipher was shifted by 13, then you would map $a \rightarrow n, b \rightarrow o, c \rightarrow p, \dots$ and so forth. Such ciphers are still in use today, but not for secure communications. If you wanted to talk about your favourite currently screening movies and tv shows on Twitter or something, current internet etiquette is to conceal such spoilers by running it through a ROT13 encoding, which is just a substitution cipher with shift 13 that was just described.

The key to making cryptography work is by ensuring the encoding of the information remains secret. This doesn't work very well for substitution ciphers because the encoding is very simple. In fact, if you read enough ROT13 spoilers on Twitter, you can begin to see patterns of how words form. So one aspect of cryptography is ensuring that the key is complex and remains secret. However, this is only half the battle. The other problem that remains is how to communicate about the key.

In Canada, the banks collectively run a system called Interac, which, among other things, makes it very handy to transfer money via email and is why services like Venmo haven't really caught on. Suppose Alice and Bob, the two famous recurring characters of cryptography, want to transfer some Canadian money. So Alice logs onto her bank's website and initiates an email transfer that can be deposited by Bob into any bank account he wants once he gets the email. How does this stay secure? Alice will set a password that she communicates to Bob, who will then use the password to verify that he is allowed to accept the email transfer.

Hypothetically, what Alice needs to do is communicate the password to Bob in a secure way. What usually ends up happening is that Alice just emails Bob the password to the same account. This is bad, because if an eavesdropper, like Eve, gets access to Bob's email, then Bob is, as they say in Canada, hosed, because both the password and the email containing the transfer are sitting in Bob's email account. All Eve needs to do is use the password and then she can deposit the Canadian dollars into any bank account she wishes.

This highlights a common weakness: even though we may devise an unbreakable cryptographic scheme, if the key for that system is acquired, then we're screwed. This shifts the problem of secure communication from the message that we originally wanted to send to the problem of how to securely distribute our key.

One way to do this is to make it not a cryptographic problem anymore and do something like send the keys physically. Of course, this is what actually happened prior to communication with electronic computers. Note, however, that this is a cryptographically secure system, but that doesn't mean I couldn't just hire someone to steal your mail or break your kneecaps to get your key. Beyond those caveats, the big problem with such a scheme is that no one wants to wait for some mail just so they can log onto a website securely.

The use of a secret key for encryption and decryption is called private-key cryptography. This is also called symmetric-key cryptography. This name hints at the root of our problem and a possible solution: what if we separate the encryption and decryption key?

This leads us to the idea of public-key cryptography, or assymetric-key cryptography. Here, only the decryption keys are secret, while encryption keys are made public. This way, if Alice wants to send Bob a message that only Bob can read (perhaps containing Alice's credit card number or something), then Alice encrypts her message with Bob's public key, sends the encrypted message, and Bob would then decrypt the message.

The key to making this work is that the public and private keys need to be related in such a way that decryption is easy when the private key is known, but difficult when only the public key is known.

RSA

The most famous public-key cryptosystem (and definitely the most popular to teach in elementary number theory classes) is RSA, named for Rivest, Shamir, and Adleman who designed it in 1977 at MIT. RSA is commercially implemented in many Internet communication protocols currently in use and won Rivest, Shamir, and Adleman the Turing Award in 2002.

There are three parts to the system.

Setup. Bob wants to be able to receive encrypted messages. He does the following.
1. Choose two large, distinct primes $p$ and $q$, and let $n = pq$.
2. Choose an arbitrary integer $e$ so that $\gcd(e,(p-1)(q-1)) = 1$ and $1 \lt e \lt (p-1)(q-1)$.
3. Solve $ed = 1 \bmod{(p-1)(q-1)}$.
4. The public key is $(e,n)$.
5. The private key is $(d,n)$, and the prime numbers $p$ and $q$.
Encryption. Alice does the following to encrypt a message $M$.
1. Get Bob's public key $(e,n)$.
2. Encrypt the plaintext message $M$ as the ciphertext $C$, $$C = M^e \bmod n.$$
Then Alice sends $C$ to Bob.
Decryption. Bob receives $C$ from Alice. To decrypt it, he does the following.
1. Use the private key $(d,n)$ to decrypt $C$ into $R$, $$R = C^d \bmod n.$$
We claim that $R = M$.

To see that this is true, we first recall that we have $$R = C^d = (M^e)^d = M^{ed} \bmod n.$$ Since $ed = 1 \bmod{(p-1)(q-1)}$, by the definitions of congruence and divisibility, we have $$ed = 1+k(p-1)(q-1)$$ for some integer $k$. This gives us $$R = M^{1+k(p-1)(q-1)} \bmod n.$$

Now, we show that $R = M \bmod p$. There are two cases: either $p \mid M$ or $p \nmid M$. In the first case, we have $M = 0 \bmod p$, which gives $$R = 0^{1+k(p-1)(q-1)} = 0 \bmod p$$ and therefore $R = 0 = M \bmod p$. In the second case, $p \nmid M$ implies $\gcd(p,M) = 1$ and therefore, by Fermat's Little Theorem, $M^{p-1} = 1 \bmod p$. Then we get $$R = M(M^{p-1})^{k(q-1)} = M\cdot 1^{k(q-1)} = M \bmod p.$$ Therefore, $R = M \bmod p$. A similar argument holds for $q$ and we can show $R = M \bmod q$. Since $p$ and $q$ are distinct primes, we must have $\gcd(p,q) = 1$ and therefore, by the Chinese Remainder Theorem, we can put together the two congruences to get $R = M \bmod{n}$.

Diffie-Hellman and discrete logarithms

RSA was the first practical public-key cryptosystem, but it was inspired by an earlier system proposed by Diffie and Hellman in 1976. They give a protocol for secure key-exchange, which, like RSA, is used all over the Internet today. Rather than factoring primes, their system is based on a related problem, the discrete logarithm problem.

Definition 6.9. A primitive root modulo a prime $p$ is an integer $r \in \mathbb Z_p$ such that every nonzero element of $\mathbb Z_p$ is a power of $r$.

Something that we won't prove is the fact that if $p$ is prime, then $\mathbb Z_p$ will contain a primitive root. We can think of this root as a generator, since all we need to generate every element in $\mathbb Z_p$ is to multiply $p$ by itself a bunch of times.

Now, if we can perform exponentation, we can also think of doing the "inverse" of exponentation: take a logarithm. Recall that a logarithm for base $b$ is $\log_b x = e$ if $x = b^e$. We want to "discretize" this notion.

Definition 6.10. Let $p$ be a prime, $r$ a primitive root modulo $p$, and $a$ an integer with $1 \leq a \leq p-1$. If $r^e \pmod p = a$ and 0 \leq e \leq p-1$, then $e$ is the discrete logarithm of $a$ modulo $p$ to the base $r$.

Then the discrete logarithm problem is: Given a prime $p$, primitive root $r$, and integer $a \in \mathbb Z_p$, find the discrete logarithm of $a$ for $r$. In other words, we want to find the $e$ such that $a = r^e \pmod p$.

The discrete logarithm problem turns out to be hard, for roughly the same reasons as factoring. Let's see how Diffie and Hellman make use of this.

As before, Alice and Bob want to share a private key using public pieces of information. They do the following.

Alice and Bob agree to a prime $p$ and primitive root $a$ or $p$.
Alice chooses a secret integer $k_1$ and sends $a^{k_1} \pmod p$ to Bob$.
Bob chooses a secret integer $k_2$ and sends $a^{k_2} \pmod p$ to Alice$.
Alice computes $(a^{k_2})^{k_1} \pmod p$.
Bob computes $(a^{k_1})^{k_2} \pmod p$.
The shared key is $a^{k_1 \cdot k_2} \pmod p$.

Here, we distinguish between information that is made "public" (i.e., sent over an insecure channel and likely to be exposed) and "private" (information that is kept secure). The public pieces of information are $p$, $a$, $a^{k_1} \pmod p$, and $a^{k_2} \pmod p$. The private pieces of information are $k_1$, $k_2$, and $a^{k_1 k_2} \pmod p$.

If our eavesdropper Eve wanted to figure out the secret key $a^{k_1 k_2} \pmod p$, then something she could do is get $a^{k_1} \pmod p$ and $a^{k_2} \pmod p$, both of which have been transmitted insecurely, and compute $a^{k_1 k_2} \pmod p$ if she can figure out what $k_1$ and $k_2$ are. But figuring out $k_1$ and $k_2$ is exactly the discrete logarithm problem, since she would also know $a$ and $p$.

Why is this hard? Well, the obvious way to compute this would be to just try computing all the $a^i$s, which gives you up to $p$ things to check. A simple modification lets us go a bit quicker.

Take $r = \left\lceil \sqrt p \right\rceil$. We compute the following lists:

$a^r \pmod p, a^{2r} \pmod p, \dots, a^{r^2} \pmod p$
$c \cdot a^0 \pmod p, c \cdot a^1 \pmod p, \dots, c \cdot a^{r-1} \pmod p$

If we found the same number in both lists, say $a^{i \cdot r} = c \cdot a^j \pmod p$, then this gives us $a^{ir-j} = c \pmod p$ and $i\cdot r - j$ is the discrete logarithm for $c$.

Computing these takes $O(\sqrt p \log p)$ time. Then to do the search, we can sort both lists, which takes $O(\sqrt p \log p)$ time, and for each of the $\sqrt p$ elements in one list, perform binary search on the other list, for a total of $O(\sqrt p \log p)$ time.

This approach is called baby-step, giant-step, because one list consists of baby steps ($0,1,2,\dots,r-1$) and the other step consists of giant steps ($r,2r, \dots, r^2$).

As with the analysis of Euclid's algorithm, we have to be careful about what $O(\sqrt p \log p)$ means. Here, we're counting operations like multiplications and comparisons. However, you'll learn in algorithms that these computations should really be viewed at the bit level, like we did with Euclid's algorithm. This means that we should be considering the complexity of this algorithm in terms of $\log p$, not $p$. So if we let $n = \log p$, we actually have something like $O(n2^{\sqrt n})$.

The future

The viability of the public-key cryptography systems that we've looked at rely on the ease of multiplying numbers together and the hardness of factoring numbers. This makes it very difficult to design new cryptosystems. Obviously, computing the secret key has to be difficult, but it would be very easy to do if we didn't need to make using the public key easy too.

However, even the assumption that factoring is difficult is a significant one for a variety of reasons. First of all, it is not quite proven that factoring really is "hard". Currently, the best we can say is that there are no known efficient algorithms for factoring, but no one has been able to prove that it's impossible to come up with one. If someone were to come up with one, then we would be in a lot of trouble.

However, the other big problem is that someone has come up with an efficient factoring algorithm—assuming that you have access to a quantum computer. Peter Shor's quantum factoring algorithm from 1994 was one of the first algorithms to demonstrate that quantum computers were provably more powerful than classical computers.

While there are no viable large-scale quantum computers yet, the possibility that they will get built is driving current cryptography research, particularly since secure systems that are built now, need to be secure for the next decade or so. The NSA has recently put out proposals for new quantum-secure cryptographic methods. Many of these proposed systems were weeded out in the first round, by being broken using classical methods.

This speaks to the difficulty of constructing new cryptosystems. One advantage that the elementary number theoretic systems that we're used to had was that the problems underlying these systems had been well-studied for a very long time. Factoring was a difficult problem, but we spent hundreds of years studying it. Newer systems are built on much newer mathematics that are only a couple of decades old. We simply haven't had time to digest and completely understand them.

However, even many current secure systems have moved on from systems based on elementary number theory, like RSA, to systems based on problems from algebraic number theory, like elliptic curve cryptography. These are fairly well-understood, but are not resistant to quantum attacks.

Combinatorics

Recall that our brief look at arrangements of words was from a relatively new and growing field called combinatorics on words. Combinatorics is the study of counting objects and their arrangements. We'll begin looking at combinatorics more generally.

Example 7.1. Suppose that you've been invited to an elaborate dinner party like for a wedding or something. Assuming you have no further dietary restrictions, you are presented with a choice of four starters, three mains, and three desserts. How many different three-course meals can you construct? If we first choose one of the four starters, then for each choice, we have another three choices of mains, and for each of those choices, we have a choice of three desserts. So this gives us $4 \times 3 \times 3 = 36$ different ways to construct our desired meal.

We can think of each course as a set: $S$, $M$, $D$, and so what we're really doing is asking for the number of tuples that we can get out of these sets, where the first element of the tuple is a starter, the second is a main, and the third is a dessert. In other words, we want to know $|S \times M \times D|$. This gives us a general formula for the size of a Cartesian product.

Theorem 7.2 (Product Rule). For finite sets $A_1, A_2, \dots, A_n$, $$|A_1 \times A_2 \times \cdots \times A_n| = |A_1| \cdot |A_2| \cdots |A_n|.$$

Informally, the product rule describes the number of things you get when you're given several sets of things to choose from and you can pick one from each.

Strings are a fundamental object in computer science not only in the programs we write, but also in theoretical computer science. After all, everything that happens in computers really just boils down to words and manipulation of words of bits. We tend to think of words as sequential objects, but words turn out to be another example of a recursively defined object.

Definition 7.3. Let $\Sigma$ be an alphabet. Then the set of strings, or words, over the alphabet $\Sigma$ is denoted by $\Sigma^*$. Strings can be defined inductively by

$\varepsilon \in \Sigma^*$, where $\varepsilon$ is the empty word, which contains no symbols,
if $w \in \Sigma^*$ and $a \in \Sigma$, then $wa \in \Sigma^*$.

The concatenation of two words $u = a_1 a_2 \cdots a_m$ and $v = b_1 b_2 \cdots b_n$ is $u \cdot v = a_1 a_2 \cdots a_m b_1 b_2 \cdots b_n$. Usually, we simply write $uv = u \cdot v$. The length of a word $w$, denoted $|w|$, is the number of symbols in $w$.

Note that "string" and "word" are used interchangeably.

Example 7.4. Let $\Sigma = \{0,1\}$. Then $\Sigma^*$ is the set of all binary strings. This set contains the words $\{\varepsilon, 0, 1, 00, 01, 10, 11, \dots\}$. The concatenation of the words $000$ and $101$ is $000101$. The length of $000101$ is 6.

Example 7.5. Let $\Sigma$ be an alphabet of size $k$. How many strings over $\Sigma$ of length $\ell$ are there? Here, we can think of strings in the C sense, as an array of characters, or a tuple. This means that an $\ell$-length string is an $\ell$-tuple belonging to the set $\Sigma \times \Sigma \times \cdots \times \Sigma = \Sigma^\ell$. Applying the product rule, we get that there are $k^\ell$ $\ell$-length strings over $\Sigma$.

Example 7.6. Suppose that I have a set $A = \{x_1, x_2, \dots, x_n\}$ with $n$ objects in it. How many subsets of $A$ are there? From our previous discussions, we know that a set of $n$ objects has exactly $2^n$ subsets. Let's consider the following approach. For each subset $S \subseteq A$, let $w_S$ be a string over the binary alphabet $\{0,1\}$. Then we define $w_S$ by $w_S = a_1 a_2 \cdots w_n$, where for $1 \leq i \leq n$, \begin{equation*} a_i = \begin{cases} 0 & \text{if $x_i \not \in S$,} \\ 1 &\text{if $x_i \in S$.} \end{cases} \end{equation*} For instance, $w_\emptyset = 00 \cdots 0 = 0^n$, while $w_A = 11\cdots 1 = 1^n$. How many strings of the form $w_S$ are there? Our prior discussion about the number of strings makes this easy: over the binary alphabet, there are exactly $2^n$ strings of length $n$. The string $w_S$ is sometimes called the characteristic string of $S$.

Example 7.7. Suppose you're my (unfortunately real) bank and your online banking system only supports passwords that are 6-digit PINs. How many different possible passwords are there? Well, we consider the set of digits $\{0,1,2,3,4,5,6,7,8,9\}$, which is of size 10. Then applying the product rule, we get $10^6 = 1000000$, which is uncomfortably small.

Example 7.8. Now consider the University of Chicago's CNetID password policy:

A standard password with a length of 12-18 characters can be used and must contain characters from three of the following categories:

Uppercase letters
Lowercase letters
Numbers (1, 2, 3, etc)
Symbols, including ! @ # $ % & * ( ) - + = _ | \ [ ] { } < > , . : ; /

Here, we have four distinct categories of symbols,

$U = \{A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z\}$,
$L = \{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z\}$,
$N = \{0,1,2,3,4,5,6,7,8,9\}$,
$S = \{!,@,\#,\%,\&,*,(,),-,+,=,_,|,\setminus,[,],\{,\},<,>,,,.,:,;,/\}$.

In other words, we want to make strings over the alphabet $U \cup L \cup N \cup S$. Then how many symbols do we have to make our password? Or, what is $|U \cup L \cup N \cup S|$? Luckily, our sets are disjoint, so we can immediately apply a lemma about the size of unions of disjoint sets back from our discussion about set theory:

Lemma. If $A$ and $B$ are disjoint sets, then $|A \cup B| = |A| + |B|$.

This gives us $26+26+10+26 = 88$ symbols with which we can make passwords.

We can state this lemma more generally.

Theorem 7.9 (Sum Rule). For finite disjoint sets $A_1, A_2, \dots, A_n$, $$|A_1 \cup A_2 \cup \cdots \cup A_n| = |A_1| + |A_2| + \cdots + |A_n|.$$

Informally, the sum rule describes a situation where you're given disjoint sets of things to choose from, but you may only make one choice in total.

Example 7.10. Let's start small and consider only passwords of length 12. Taking our discussion above, we know that we can make $$88^{12} = 215,671,155,821,681,003,462,656$$ possible passwords of length 12. This is about $2.1567 \times 10^{23}$, or somewhere around a third of a mole's worth of particles. However, this doesn't take into account any of the restrictions on what kinds of symbols must be included.