MPCS 50103 — Lecture 2

Set theory

Now that we've defined the language of first-order logic, we can move on to talking about set theory. Modern mathematics happens to be built on the idea that everything is a set. Everything. Functions? Sets. Numbers? Sets. Problems? Sets. Sets? Sets. So, it's important to familiarize ourselves with the basics of sets and how they work.

Definition 2.1. A set is an unordered collection of objects. If $S$ is a set and $a$ is a member of $S$, then we write $a \in S$. Similarly, if $a$ is not a member of $S$, then we write $a \not \in S$.

Here, objects is used loosely. Often, we'll just stick to numbers, but these can really be anything: books, trees, other sets, or anything else.

Note that since sets are unordered, this means that $\{1,2,3\}$ and $\{3,2,1\}$ are the same set. Furthermore, since an element is either a member or not a member of a set, each element occurs in a set at most once. This means that $\{2,1,3,2\}$ is really just $\{1,2,3\}$.

Definition 2.2. The set $\{\}$ containing no elements is the empty set and is denoted by $\emptyset = \{\}$.

In light of the above remark about objects in sets, note that $\{\emptyset\}$ is not the same as $\emptyset = \{\}$. The set $\{\emptyset\}$ is actually the set containing the empty set, which means it is not empty; it contains one element: the empty set.

There are two common ways to define sets. First, we can just list the elements of the set, like $\{1, 2, 3\}$. This is easy to do for finite sets. For infinite sets, we can do something like $\{1, 2, 3, \dots\}$ and usually, that's enough for us to understand that this is the set containing all natural numbers.

For more complicated definitions, we define sets using comprehensions, also called set-builder notation.

Example 2.3. Similar to what we did above, if we wanted to define the set of even natural numbers, we can write something like $\{2,4,6,\dots\}$. Using comprehensions, we can also write $$\{n \in \mathbb N \mid \exists m \in \mathbb N, n = 2 \cdot m \},$$ which says that we want the set of all natural numbers $n$ such that $n$ can be expressed as $2 \cdot m$ for some natural number $m$. That, of course, turns out to be the even natural numbers.

We can do more complicated things with this, since there aren't very many hard rules about what kinds of conditions we can or cannot place on our sets. For instance, we can define the set $$\{n \in \mathbb N \mid \exists m \in \mathbb N, (2 \cdot m = n) \wedge (\forall k \in \mathbb N, n \neq k^2) \}.$$ Sometimes, we don't even bother with that level of formality and just write something like $$\{n \in \mathbb N \mid \text{$n$ is even and not a square}\}.$$ This means that we have a lot of power in how we want to define our sets. Without any further restrictions on what we can say when defining sets, this puts us in the realm of naive set theory. This system was devised by Georg Cantor and later, Gottlob Frege as part of his quest to formalize a foundation for mathematics.

However, we can get into trouble if we're not careful about how we define sets. Bertrand Russell noticed this problem, leading to the following famous example: $$S = \{T \mid T \not \in T\}.$$ If we let something like this happen, then we get to conclude $S \in S \iff S \not \in S$, which is very bad if we want mathematics to be consistent (when Russell communicated this to Frege, he was quite spooked). This led to the development of axiomatic set theory. For our purposes, we won't need to delve into axiomatic set theory, but it is important to note that when we talk about set theory being the basis of all mathematics, what we really mean is the axiomatic set theory.

Definition 2.4. Two sets $A$ and $B$ are equal if and only if they have the same elements. That is, $$A = B \iff (\forall x, x \in A \iff x \in B).$$

Definition 2.5. A set $S$ is a subset of $T$, written $S \subseteq T$, when every element of $S$ belongs to $T$. A set $S$ is a proper subset of a set $T$, written $S \subsetneq T$, when $S$ is a subset of $T$ and there exists an element of $T$ which does not belong to $S$.

You may notice that sometimes $\subset$ is used for proper subset. This works quite nicely with $\subseteq$ meaning subset. However, we'll avoid this particular notation because there are many other mathematicians who use $\subset$ to mean (not necessarily a proper) subset. Instead, we will use $\subseteq$ and $\subsetneq$ to keep things clear.

From the above, we note that by definition, we have $S \subseteq S$ and if $S = T$, we have $S \subseteq T$ and $T \subseteq S$. This second definition gives us an alternate characterization of two sets that are equal.

Definition 2.6. The cardinality of a set $S$ is the number of elements in $S$ and is denoted $|S|$. If $S$ is finite, then this will be a natural number. So, the size of the set $\{1,2,4,8\}$ would be $|\{1,2,4,8\}| = 4$.

If $S$ is an infinite set, then things are a bit trickier. The cardinality of the natural numbers is defined to be $|\mathbb N| = \aleph_0$, while the cardinality of the real numbers is $|\mathbb R| = 2^{\aleph_0}$. Here, we reach into the Hebrew alphabet for $\aleph$ (aleph). Anyhow, the cardinalities of $\mathbb N$ and $\mathbb R$ are clearly not the same, a fact famously proved by Cantor in 1891. There are many infinite sets that have cardinality $\aleph_0$, such as the set of even natural numbers $\{n \mid \exists m \in \mathbb N, n = 2\cdot m\}$ or the set of rational numbers $\mathbb Q = \{\frac m n \mid \exists m,n \in \mathbb N\}$. There are also many sets with cardinality $2^{\aleph_0}$. This brings us to a fun problem for you to think about in your spare time: are there any infinite sets that have cardinality strictly between $|\mathbb N| = \aleph_0$ and $|\mathbb R| = 2^{\aleph_0}$?

Definition 2.7. The power set of a set $A$ is the set containing all of the subsets of $A$, $$\mathcal P(A) = \{S \mid S \subseteq A\}.$$

Theorem 2.8. If $A$ is a finite set, then $|\mathcal P(A)| = 2^{|A|}$.

We will prove this later, in a few ways. However, because of this fact, you'll sometimes see the power set of a set $A$ denoted by $2^A$. This also gives you some insight about why it might be that $|\mathbb R| = 2^{\aleph_0}$.

Recall that sets are unordered collections and don't contain multiples of elements. But what if order does matter? We need to define another structure to work with.

Definition 2.9. An $n$-tuple $(a_1, a_2, \dots, a_n)$ is an ordered collection that has $a_1$ as its first element, $a_2$ as its second element, $\dots$, and $a_n$ as its $n$th element. An ordered pair is a 2-tuple.

Observe that since tuples are ordered, we have $(a_1, \dots, a_n) = (b_1, \dots, b_n)$ if and only if $a_i = b_i$ for $i = 1, \dots, n$.

As a side note, we claimed earlier that everything in mathematics is just some kind of set, so how would we define tuples using sets? To keep things simple, we'll consider the definition of the pair first. We can define $(x,y)$ to be the set $\{\{x\}, \{x,y\}\}$. In this way, we can distinguish $x$ and $y$ and we have a way to determine the order in which they appear. This definition is due to Kuratowski in 1921. We can then generalize this definition for arity $n \gt 2$.

Theorem 2.10. $(a,b) = (c,d)$ if and only if $a = c$ and $b = d$.

Proof. We must show that

  1. if $(a,b) = (c,d)$ then $a = c$ and $b = d$, and
  2. if $a = c$ and $b = d$, then $(a,b) = (c,d)$.
We will begin with (b). Suppose that $a = c$ and $b = d$. Then $\{a\} = \{c\}$ and $\{a,b\} = \{c,d\}$ by definition of set equality. Then we have \begin{align*} (a,b) &= \{\{a\},\{a,b\}\} & \text{definition of pairs} \\ &= \{\{c\},\{c,d\}\} & \text{set equality} \\ &= (c,d) & \text{definition of pairs} \end{align*} Therefore, $(a,b) = (c,d)$. Now, we will show (a). Suppose that $(a,b) = (c,d)$. By definition, this means that $$\{\{a\},\{a,b\}\} = \{\{c\},\{c,d\}\}.$$ There are two cases to consider: either $a = b$ or $a \neq b$. First, we'll consider $a = b$. In this case, we have $$\{\{a\},\{a,b\}\} = \{\{a\},\{a,a\}\} = \{\{a\},\{a\}\} = \{\{a\}\}.$$ Since $(a,b) = (c,d)$ and $\{\{a\},\{a,b\}\} = \{\{a\}\}$, we have $$\{\{a\},\{a,b\}\} = \{\{a\}\} = \{\{c\},\{c,d\}\}.$$ Therefore, it must be the case that $\{\{c\},\{c,d\}\}$ also has one element. This means that $\{c\} = \{c,d\}$ and therefore $c = d$. Then $$\{\{a\}\} = \{\{c\},\{c,d\}\} = \{\{c\}\}$$ and we have $a = c$ and $b = d$. Now, we consider the remaining case, $a \neq b$. We have that $\{a\} \neq \{a,b\}$. If $c = d$, then we would have $\{c\} = \{c,d\}$ and $\{a\} = \{c\} = \{a,b\}$, which contradicts our assumption that $a \neq b$. Therefore, $c \neq d$. Since $c \neq d$, we have $\{a\} = \{c\}$, since otherwise, $\{a\} = \{c,d\}$, which cannot be the case since $\{a\}$ contains 1 element and $\{c,d\}$ contains 2 elements. Thus, from our assumption that $\{\{a\},\{a,b\}\} = \{\{c\},\{c,d\}\}$, we have $\{a\} = \{c\}$ and $\{a,b\} = \{c,d\}$. From $\{a\} = \{c\}$, we conclude that $a = c$. Since $a = c$, from $\{a,b\} = \{c,d\}$, we conclude that $b = d$. Thus, we have $a = b$ and $c = d$. $\tag*{$\Box$}$

Set operations

We'll begin going through operations that we can perform on sets with one that is related to our discussion of tuples from last class.

Definition 2.11. The cartesian product of two sets $A$ and $B$ is $$A \times B = \{(a,b) \mid a \in A, b \in B\}.$$

We can generalize this to products of $n$ sets.

Definition 2.12. The cartesian product of $n$ sets $A_1, A_2, \dots, A_n$, denoted $A_1 \times A_2 \times \cdots \times A_n$ is defined $$A_1 \times A_2 \times \cdots \times A_n = \{(a_1, a_2, \dots, a_n) \mid a_i \in A_i, i = 1, 2, \dots, n\}.$$

Example 2.13. Everything is a set and it turns out that the definitions of predicates I vaguely danced around earlier are defined much more concretely once we know some set theory. Suppose I have an interpretation $\mathcal I$, which is just what we'll call how we define our domain and the meaning of our predicates. We want to distinguish the symbols from how we're interpreting them, since we can take the same formula made up of the same symbols and apply a different interpretation to them. Don't worry too much about this because we won't be using this again.

First, the domain is a set; let's call it $\mathcal D$. Then we can define a $k$-ary predicate $P$ under our interpretation $\mathcal I$ as a set of $k$-tuples $(a_1, \dots, a_k)$. In other words, $P^\mathcal I \subseteq \mathcal D^k$. Then we can define the following: $$P(x_1, \dots, x_k) = \begin{cases}T & \text{if $(x_1^\mathcal I, \dots, x_k^\mathcal I) \in P^\mathcal I$,} \\ F & \text{if $(x_1^\mathcal I, \dots, x_k^\mathcal I) \not \in P^\mathcal I$.} \end{cases} $$

If we recall our predicates $E$ and $L$ from the other day, then we can define (assuming the domain for our interpretation $\mathcal I$ is $\mathbb N$): \begin{align*} E^\mathcal I &= \{n \in \mathbb N \mid \exists m \in \mathbb N, n = 2m\} \\ L^\mathcal I &= \{\langle x,y \rangle \in \mathbb N^2 \mid x \lt y\}. \end{align*}

This definition actually points to a correspondence between what we did in logic and the set operations we'll now define. In some cases, the correspondence is very obvious because the definitions themselves are taken right from logic.

Definition 2.14. The union of two sets $S$ and $T$, denoted $S \cup T$ is defined $$S \cup T = \{x \mid x \in S \vee x \in T\}.$$ The intersection of two sets $S$ and $T$, denoted $S \cap T$, is defined $$S \cap T = \{x \mid x \in S \wedge x \in T\}.$$

Definition 2.15. The complement of a set $S$, written $\overline S$, is the set of all elements not in $S$, $$\overline S = \{x \mid x \not \in S\}.$$

We can also define $\overline S$ relative to a universal set $U$ by $\overline S = U \setminus S = \{x \in U \mid x \not \in S\}$. If we wanted to get much more formal and delve into the world of axiomatic set theory, we would have to place some conditions on what $U$ can or can't be. However, for our purposes, this is fine and the notion of the universal set corresponds nicely with the notion of the domain from first-order logic.

So far, we have $\vee, \wedge, \neg$ covered, but what about $\rightarrow$? This turns out to be something we defined last class: subset. Recall that $A \subseteq B$ if $\forall x (x \in A \rightarrow x \in B)$. The correspondence is right there in the definition, hiding in plain sight.

Theorem 2.16 (De Morgan's laws). For two sets $A$ and $B$, \begin{align} \overline{A \cup B} &= \overline A \cap \overline B, \\ \overline{A \cap B} &= \overline A \cup \overline B. \end{align}

There are two approaches to proving that two sets $S$ and $T$ are equal. The first and more general method is to show that $S \subseteq T$ and $T \subseteq S$. Then, by definition $S = T$. This mirrors how you would prove an "if and only if" statement. Let's jump back to logic world and recall that an "if and only if" statement is of the form $S$ if and only if $T$ and means that $S \rightarrow T$ and $T \rightarrow S$. The correspondence between sets world and logic world then becomes very obvious.

Now, the second way to show that $S$ and $T$ are equal are to take advantage of the definitions of the operations being performed and to "lift" the corresponding propositional logic equivalences. This is more straightforward than the first method of proving the subset inclusions, but this really only works when your sets are obviously defined by basic set operations. Once you start dealing with more complicated sets, it's often necessary to prove set equality using the first method because the logical analysis is not quite so clear-cut, which is why it is more general.

Proof. First, we will show that $\overline{A \cap B} = \overline A \cup \overline B$.

We will begin by showing $\overline{A \cap B} \subseteq \overline A \cup \overline B$. Let $x \in \overline{A \cap B}$. Then $x \not \in A \cap B$ by definition. Expressed as a logical statement, this is $\neg(x \in A \cap B)$. By definition of intersection, we have $\neg(x \in A \wedge x \in B)$. Applying De Morgan's laws, we get $\neg(x \in A) \vee \neg(x \in B)$. By definition, this gives us $x \not \in A \vee x \not \in B$. Then by definition of set complement, we have $x \in \overline A \vee x \in \overline B$. Applying the definition of union gives us $x \in \overline A \cup \overline B$. Therefore, $\overline{A \cup B} \subseteq \overline A \cap \overline B$.

Next, we will show $\overline A \cup \overline B \subseteq \overline{A \cap B}$. Let $x \in \overline A \cup \overline B$. By definition of union, this means $x \in \overline A \vee x \in \overline B$. By definition of set membership, this means $x \not \in A \vee x \not \in B$. But this is just the negation of membership, so this means $\neg (x \in A) \vee \neg (x \in B)$ and by applying De Morgan's laws, we have $\neg(x \in a \wedge x \in B)$. By definition of intersection, we get $\neg(x \in A \cap B)$ and by definition of negation, we have $x \not \in A \cap B$. But this is $x \in \overline{A \cap B}$ by definition of complement. Thus, we have shown $\overline A \cup \overline B \subseteq \overline{A \cap B}$.

We have shown $\overline{A \cup B} = \overline A \cap \overline B$.

Now, we will show that $\overline{A \cup B} = \overline A \cap \overline B$. We have \begin{align*} \overline{A \cup B} &= \{x \mid x \neq A \cup B\} &\text{definition of complement} \\ &= \{x \mid \neg(x \in A \cup B)\} &\text{definition of set membership} \\ &= \{x \mid \neg(x \in A \vee x \in B)\} &\text{definition of union} \\ &= \{x \mid \neg(x \in A) \wedge \neg(x \in B)\} &\text{De Morgan's laws} \\ &= \{x \mid x \not\in A \wedge x \not\in B\} &\text{definition of set membership} \\ &= \{x \mid x \in \overline A \wedge x \in \overline B\} &\text{definition of complement} \\ &= \{x \mid x \in \overline A \cap \overline B\} &\text{definition of intersection} \\ &= \overline A \cap \overline B &\text{set definition} \\ \end{align*} $$\tag*{$\Box$}$$

Many of logical equivalences from propositional logic have analogs in set theory and can be proved similarly. These would make good exercises.

Definition 2.17. The difference of two sets $S$ and $T$, denoted $S \setminus T$, is defined $$S \setminus T = \{x \mid x \in S \wedge x \not \in T\}.$$

Theorem 2.18 (Inclusion-Exclusion). For all finite sets $A$ and $B$, $|A \cup B| = |A| + |B| - |A \cap B|$.

A simple proof of this goes like this: we count all of the elements in $A$ and then count all of the elements of $B$. However, if there are elements in both $A$ and $B$, we've counted them twice, so we have to take out $|A \cap B|$ from our count.

Definition 2.19. We say two sets $S$ and $T$ are disjoint if $S \cap T = \emptyset$.

Lemma 2.20. If $A$ and $B$ are disjoint sets, then $|A \cup B| = |A| + |B|$.

This is easy to see, since the intersection of $A$ and $B$ is empty.

Definition 2.21. Let $A_i$ be a family of sets, with $1 \leq i \leq n$. We define \begin{align*} \bigcup_{i=1}^n A_i &= A_1 \cup A_2 \cup \cdots \cup A_n, \\ \bigcap_{i=1}^n A_i &= A_1 \cap A_2 \cap \cdots \cap A_n. \\ \end{align*}

Relations and functions

You may have learned various definitions of what is or isn't a function. Typically, we think of functions in a very calculus kind of way, where we assign a value $x$ to some other value based on some expression like $x^2$. Then, we come across more complicated functions, that are defined piecewise, or maybe weird functions like the Heaviside function or the parity function. Here, we'll make use of the set theory concepts we've just defined to work towards a general definition for functions.

Definition 2.22. A relation $R$ with domain $X$ and co-domain $Y$ is a subset of $X \times Y$.

We can see from the definition that a relation really is just a subset of the cartesian product of some sets. In other words, it's a set of tuples. This also resembles the set-theoretic definition of predicates from above and it's not entirely a coincidence that we think of $k$-ary predicates as relations.

Definition 2.23. A relation $R \subseteq A \times B$ is total if $\forall a \in A, \exists b \in B, R(a,b)$. A relation $R$ is single-valued if $\forall a \in A, \forall b_1, b_2 \in B, R(a, b_1) \wedge R(a,b_2) \rightarrow b_1 = b_2$.

These properties lead us right to a definition for functions.

Definition 2.24. A function $f : A \to B$ is a total, single-valued relation with domain $A$ and co-domain $B$. The set $B$ is also called the range of $f$.

Again, based on the definition, a function is really just a relation that satisfy the properties we just defined, totality and single-valued-ness. And if we go further, a function is just a set, like everything else in mathematics. The restrictions on the domain and co-domain of the function tracks with our intuitive understanding of functions that we may have learned earlier. For instance, totality guarantees that a function gives us something for every value we throw at it, while single-valuedness guarantees that our function will only give us one possible output for every input that we give it.

We also have different kinds of functions that are important and worth distinguishing.

Definition 2.25. A function $f:A \to B$ is one-to-one or injective if $$\forall x, y \in A, f(x) = f(y) \rightarrow x = y.$$

Example 2.26. The successor function $f:\mathbb N \to \mathbb N$ defined by $f(x) = x+1$ is one-to-one. On the other hand, the function $g:\mathbb N \to \mathbb N$ defined by $g(x) = \left\lfloor \frac x 2 \right\rfloor$ is not one-to-one, since for any even natural number $m$, we have $g(m) = g(m+1)$ but $m \neq m+1$.

Definition.2.27. The image of a function $f : A \to B$ is the set $\operatorname{im} f = \{f(x) \mid x \in A\}$.

Clearly, $\operatorname{im} f \subseteq \operatorname{rng} f$. However, it is not always the case that $\operatorname{im} f = \operatorname{rng} f$. If it is, we get another special kind of function.

Definition 2.28. A function $f:A \to B$ is onto or surjective if $\forall y \in B, \exists x \in A, f(x) = y$.

Example 2.29. The function $f:\mathbb N \to \mathbb N$ defined by $f(x) = x^2$ is not surjective (for instance, there is no natural number $n$ such that $n^2 = 2$). On the other hand, the function $g(x) = \left\lfloor \frac x 2 \right\rfloor$ from the above example is surjective.

Let's consider the idea of inverses of functions. At first glance, this seems pretty easy, since all we need to do is "work backwards". If we look at a relation $R \subseteq A \times B$, we can do this pretty easily by defining $$R^{-1} = \{(b,a) \in B \times A \mid (a,b) \in R\}.$$ We run into a problem very quickly if we want to do this with functions: the relation that we get by applying this trick to functions doesn't need to be a function! Instead, we define what are called one-sided inverses. To do this, we return to the world of relations to define an operation.

Definition 2.30. If $R \subseteq A \times B$ and $S \subseteq B \times C$, then the join of $R$ and $S$ is a relation on $A \times C$, $$S \circ R = \{(a,c) \in A \times C \mid \exists b \in B, (a,b) \in R \wedge (b,c) \in S\}.$$ Since functions are relations, we can define joins on functions as well. It turns out that this is really just function composition: if you have functions $f:A \to B$ and $g: B \to C$, then the join of the two functions when viewed as a relation gives us the composition of the functions $g \circ f: A \to C$. That brings us to the following definition for inverses.

Definition 2.31. Consider a function $f:A \to B$. Then the function $g:B \to A$ is a left inverse of $f$ if $\forall x \in A, g(f(x)) = x$. Similarly, $g$ is a right inverse of $f$ if $\forall y \in B, f(g(y)) = y$.

Typically, the left inverse is the one we think of as "the" inverse of a function, so we drop the "left" when the context is clear.

Definition 2.32. A function $f:A \to B$ is a correspondence or bijection if $f$ is one-to-one and onto.

A correspondence between two sets $A$ and $B$ uniquely pairs each element of $A$ with an element in $B$. This turns out to be how we can determine something like $|\mathbb N| = |2 \mathbb N|$, where $2 \mathbb N$ is the set of even natural numbers. The correspondence from $\mathbb N$ to $2 \mathbb N$ is very simple: it's the function defined by $f(n) = 2n$. Since a correspondence pairs each element in the set, both sets must be the same size. If $B$ were smaller than $A$, then $f$ couldn't be one-to-one, and if $B$ were larger than $A$, then $f$ couldn't be onto. This turns out to be how Cantor shows that $|\mathbb R| \gt |\mathbb N|$, by showing that there are real numbers that would be "missed" by any function that one claimed to be a correspondence from $\mathbb N$ to $\mathbb R$.

Divisibility

Let's begin with some number theory. One of the first topics in elementary number theory is divsibility of integers. Division on integers is an interesting operation even though we've likely been dividing numbers since childhood without thinking about it too much. However, where it does come back to trip us up again is when we start to learn how to program. This usually comes in the form of trying to divide two numbers and then remembering that dividing two integers doesn't always give you an integer. This happens to be why division on integers is interesting.

Definition 3.1. Let $a,b$ be integers. We say that $a$ divides $b$, written $a \mid b$, if there exists an integer $n$ such that $a \cdot n = b$. We also say that $b$ is a multiple of $a$.

First, note that divisbility defines a relation in the sense that we defined last class. Of course, when we defined relations, we treated them as a set. Often times, we treat them as operators, writing them infix like $a \lt b$ or $a \mid b$ instead of treating them as sets and writing them $(a,b) \in \lt$ or $(a,b) \in \mid$.

Observe that by this definition, we have that for all integers $n$, $n \mid 0$ and in particular, $0 \mid 0$. At first this seems a bit odd because it feels like we are tempting fate by talking about dividing by 0, but if we take a more careful look at the definition of divisibility, we see that there actually isn't any division going on. What we're really talking about is multiplication. This important to keep in mind because Rosen, for whatever reason, chooses to add the condition that $a \neq 0$.

Theorem 3.2. For all integers $a, b, c$, if $a \mid b$ and $a \mid c$, then $a \mid (b + c)$.

We can use what we've learned last week and translate it into logic-language: $$\forall a, b, c \in \mathbb Z, (a \mid b) \wedge (b \mid c) \rightarrow a \mid (b+c).$$ So in order to prove this statement is true, we need to consider every integer for $a,b,c$ and assume that our hypothesis, $a \mid b$ and $a \mid c$, is true. Remember that since this is an implication, we don't care about the case when we don't have $a,b,c$ such that $a \mid b$ and $a \mid c$.

Proof. Let $a,b,c$ be arbitrary integers and assume that $a \mid b$ and $a \mid c$. Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$, and since $a \mid c$, there exists an integer $n$ such that $c = n \cdot a$. Now, consider the integer $b + c$. We have $$b + c = a \cdot m + a \cdot n = a \cdot (m + n).$$ Since $m + n$ is an integer, by definition of divisibility we have $a \mid (b+c)$. $$\tag*{$\Box$}$$

Let's take a more careful look at what it is that we've done in this proof.

Let $a,b,c$ be arbitrary integers Since our claim about $a,b,c$ must hold for all possible integers, we simply say that they're arbitrarily chosen. We can think of $a,b,c$ as placeholders for any integers that satisfy our condition.
and assume that $a \mid b$ and $a \mid c$. Here is our condition under which we chose $a,b,c$ and the assumption that we need to make to prove the theorem.
Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$, This is the definition of divsibility. Just like above, we don't know exactly which integer works, we only know that one of them works, so we give it a name (in this case $m$).
and since $a \mid c$, there exists an integer $n$ such that $c = n \cdot a$. This is the definition of divsibility again. Here, we make sure to choose a different variable name that hasn't been used ($n$).
Now, consider the integer $b + c$. This is because adding two integers gives us another integer.
We have $$b + c = a \cdot m + a \cdot n = a \cdot (m + n).$$ This is some simple algebraic manipulation.
Since $m + n$ is an integer, by definition of divisibility we have $a \mid (b+c)$. This follows from the above algebra, and we apply the definition of divisibility.
$$\tag*{$\Box$}$$ We end proofs with a box. In typography, this is called a tombstone and you often see these in publications like magazines. In mathematics, some call this a halmos, after Paul Halmos, who first used it in the mathematical context.

Theorem 3.3. For all integers $a, b, c$, if $a \mid b$, then $a \mid bc$.

Proof. Let $a,b,c$ be arbitrary integers and assume that $a \mid b$. Since $a \mid b$, there exists an integer $n$ such that $a \cdot n = b$. Now, consider the integer $b \cdot c$. We have $$b \cdot c = (a \cdot n) \cdot c = a \cdot (n \cdot c).$$ Since $n \cdot c$ is an integer, by definition of divisibility, we have $a \mid bc$. $$\tag*{$\Box$}$$

Definition 3.4. A binary relation $R$ is transitive if $\forall a, b, c$, $a \mathrel R b \wedge b \mathrel R c \rightarrow a \mathrel R c$.

In other words, if I know that $a$ is related to $b$ in some way and $b$ is related to $c$, then I can conclude that $a$ is related to $c$. The subset relation is a transitive relation that we've seen already: if we know that $A \subseteq B$ and $B \subseteq C$, it's not too hard to argue that $A \subseteq C$. Similarly, implication is also transitive: if $p \rightarrow q$ and $q \rightarrow r$, then it's not too hard to see that $p \rightarrow r$. We will show that divsibility is transitive.

Theorem 3.5. For all integers $a, b, c$, if $a \mid b$ and $b \mid c$, then $a \mid c$.

Proof. Let $a, b, c$ be arbitrary integers and assume that $a \mid b$ and $b \mid c$. Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$. Since $b \mid c$, there exists an integer $n$ such that $c = n \cdot b$. Then we get $$c = n \cdot b = n \cdot (m \cdot a) = (n \cdot m) \cdot a.$$ But $n \cdot m$ is also an integer, so by the definition of divisibility, we have $a \mid c$. $$\tag*{$\Box$}$$

Theorem 3.6. For all integers $a,b,c,m,n$, if $a \mid b$ and $a \mid c$, then $a \mid (bm + cn)$.

This is not too difficult to prove directly. However, we've already done some work, so let's make use of it.

Proof. Let $a,b,c$ be arbitrary integers and assume that $a \mid b$ and $a \mid c$. Since $a \mid b$, by Theorem 3.3, we have $a \mid bm$ for any integer $m$. Similarly, we have $a \mid cn$ for any integer $n$. Since $a \mid bm$ and $a \mid cn$, by Theorem 3.2, we have $a \mid (bm + cn)$. $$\tag*{$\Box$}$$

This is nice and neat. However, by examining the proofs of the theorems that we cited, it is not too difficult to see how we might have proved this directly.

Division

Now, we'll take a trip back to grade school and think about division again. Recall that if we divide two numbers $a$ by $b$, if $a$ is a multiple of $b$ (if $b \mid a$), we get an integer $q$ which we call the quotient. But if $a$ isn't a multiple of $b$, we get a remainder $r$. The following theorem, called the Division Theorem formally states that whenever we divide two numbers $a$ and $b$, we will always be able to "get" the integers $q$ and $r$ (they exist) and that they're unique.

Theorem 3.7 (Division Theorem). For all integers $n$ and $d \gt 0$, there exist unique integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$.

To prove this theorem, we have to show two things:

  1. that the appropriate integers $q$ and $r$ actually exist, and
  2. that the appropriate integers $q$ and $r$ are unique.

Let's prove existence first as a lemma.

Lemma 3.8. For all integers $n$ and $d \gt 0$, there exist integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$.

Proof. Let $n$ and $d \gt 0$ be arbitrary integers. Consider the set $S = \{n - d \cdot q \mid q \in \mathbb Z\}$. Since $d \gt 0$ and $q \in \mathbb Z$, there must be some non-negative element in $S$. Let $r$ be the smallest non-negative member of $S$ and let $q$ be such that $r = n - d \cdot q$. We want to show that $r \lt d$.

Suppose that it isn't and we have $r \geq d$. Then we have $0 \leq r-d \lt r$, and $r-d$ is also a non-negative element of $S$. However, this means that there is a smaller element of $S$ than $r$, which contradicts our assumption that $r$ was the smallest non-negative element of $S$. Therefore, it can't be the case that $r \geq d$.

Since $q \in \mathbb Z$ is such that $r \in S$, we have that $r = n - d \cdot q$. This gives us $n = d \cdot q + r$ with $0 \leq r \lt d$ as desired. $$\tag*{$\Box$}$$

This proof of Lemma 3.8 is nice and neat but doesn't tell us very directly how to find $q$ and $r$. However, this proof gives us a hint at how we might wrangle a more concrete algorithm out of it. Just pick a positive member of $S$ and see if it's less than $d$. If it is, then we're good. Otherwise, we know that we have to find a smaller one. We do this by increasing our candidate $q$ by 1, since $n$ and $d$ are fixed. Eventually, we'll hit the $r$ that we want and that will give us the $q$ that we want too. Of course, this isn't a complete algorithm, since I haven't defined what "picking" or "searching" entails, but it's not too hard to see how you might be able to work out those details. Of course, proving formally that such an algorithm works is another challenge, but again, doable.

Proof of Theorem 3.7. Let $n$ and $d \gt 0$ be arbitrary integers. By Lemma 3.8, there exist integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$. We want to show that $q$ and $r$ are unique.

Suppose that there exist integers $q_1, r_1, q_2, r_2$ such that $n = d \cdot q_1 + r_1$ and $n = d \cdot q_2 + r_2$ with $0 \leq r_1, r_2 \lt d$. Without loss of generality, let $r_1 \geq r_2$ and consider the following system of equations \begin{align*} n &= q_1 \cdot d + r_1 \\ n &= q_2 \cdot d + r_2. \end{align*} Subtracting gives us $0 = (q_1 - q_2) \cdot d + (r_1 - r_2)$. Rearranging this equation gives us $(q_2 - q_1) \cdot d = r_1 - r_2$. Since $(q_2 - q_1)$ is an integer, we have $d \mid (r_1 - r_2)$ by definition of divisibility. Since $0 \leq r_1 - r_2 \lt d$, we have $r_1 - r_2 = 0$ and therefore $r_1 = r_2$. Then from above, this gives us $(q_2 - q_1) \cdot d = 0$. Since $d \gt 0$, we have $q_2 - q_1 = 0$ and therefore $q_1 = q_2$. Thus, we have shown that $q$ and $r$ are unique. $$\tag*{$\Box$}$$

In the Division Theorem, the integer $d$ is named such because it is called the divisor.

Definition 3.9. The set of divisors of $n$ is defined by $\operatorname{Div}(n) = \{a : a \mid n\}$.

This is a nice definition because it is obvious what the common divisors of $m$ and $n$ are supposed to be: the intersection of $\operatorname{Div}(m)$ and $\operatorname{Div}(n)$.

Definition 3.10. If $m$ and $n$ are integers with $m \neq 0$ or $n \neq 0$, then the greatest common divisor of $m$ and $n$, denoted $\gcd(m,n)$, is the largest element of $\operatorname{Div}(m) \cap \operatorname{Div}(n)$.

Definition 3.11. Let $n$ and $d \gt 0$ be integers. We define the functions $n \mathop{\mathbf{div}} d = q$ and $n \mathop{\mathbf{mod}} d = r$, where $q$ and $r$ are as defined by the Division Theorem (3.7).

These functions are defined as in the Rosen textbook and are standard definitions. We can think of $\mathbf{div}$ as the integer division operation in many programming languages that just returns the quotient and $\mathbf{mod}$ is the modulus operator. However, if we recall the painstaking work to define functions as a special kind of relation, these aren't really functions because they're undefined for $d = 0$. Unfortunately, this is the way it is.

From the way we've defined this, it's not entirely obvious that the gcd should exist. Without loss of generality, let $m \neq 0$. Then $\operatorname{Div}(m)$ has a largest element $m$ and a smallest element $-m$. This means that $\operatorname{Div}(m)$ is finite and that $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ is also finite. Since 1 is a divisor for every integer, we know that $1 \in \operatorname{Div}(m) \cap \operatorname{Div}(n)$, which means that $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ is non-empty. Since it's not empty, $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ must contain a largest element, which is the gcd.

From the Division Theorem, we get the following result about the gcd by Étienne Bézout in the mid-1700s.

Theorem 3.12 (Bézout's Lemma). For all integers $m$ and $n$, there exist integers $a$ and $b$ such that $\gcd(m,n) = a \cdot m + b \cdot n$.

Proof. If $m = n = 0$, then any $a$ and $b$ works. So without loss of generality, we have non-zero $m$ or $n$. Consider a set $S = \{am + bn \mid a,b \in \mathbb Z\}$. Let $d$ be the least positive integer in $S$ and let $a$ and $b$ be such that $d \in S$. First, we will show that $d$ is a common divisor of $m$ and $n$.

Suppose otherwise that $t$ isn't a common divisor of $m$ and $n$. Without loss of generality, say $d \not \mid m$. Then by the Division Theorem, there exist $q$ and $r$ such that $m = q \cdot d + r$ and $d \gt r \neq 0$. Then we have \begin{align*} r &= m - q \cdot d \\ &= m - q \cdot (am + bn) \\ &= (1-qa)m + (-bq)n \end{align*} This gives us $r \in S$. But since $r \lt d$, this contradicts $d = \min S$. Therefore, $d$ must be a common divisor of $m$ and $n$.

Now, consider any common divisor $c$ of $m$ and $n$. Then there exist integers $s$ and $t$ such that $m = cs$ and $n = ct$. This gives us \begin{align*} d &= am + bn \\ &= acs + bct \\ &= c \cdot (as + bt) \end{align*} Therefore, $c \mid d$ and therefore $c \leq d$. Thus, $d = \gcd(m,n)$. $$\tag*{$\Box$}$$

Bézout's lemma is also called Bézout's identity. Confusingly, Bézout has another famous result named after him in algebraic geometry called Bézout's theorem.

Bézout's lemma is interesting in that it provides a characterization for the gcd of $m$ and $n$. Suppose we have $m$, $n$, and what we think is $\gcd(m,n)$. Is there a way to check that the gcd is really correct without computing it? If we had the coefficients of Bézout's lemma for $m$ and $n$, then we could check very easily. Of course, we're getting a bit ahead of ourselves, but keep this idea in mind while we tackle a more immediate question: how does one compute the gcd?

The obvious way to do this is to compute the set of divisors for $m$ and $n$, compute their intersection, and take the largest element in the intersection. The big problem with this approach is that factoring integers is a notoriously hard problem to solve efficiently. There are some heuristics that we can apply and some tricks that we can pull, but in general we do not have a fast algorithm for factoring numbers. This fact happens to be the backbone of many public-key cryptographic systems we use today.

However, if all we care about is the greatest common divisor and not any of hte other divisors, then the following theorem will get us there efficiently.

Theorem 3.13. If $d = \gcd(a,b)$, $b \neq 0$, and $r = a \mathop{\mathbf{mod}} b$, then $d = \gcd(b,r)$.

Proof. Let $a$ and $b \gt 0$ be arbitrary. Let $q$ be such that $a = qb + r$. If $d \mid a$ and $d \mid qb$, then $d \mid (a - qb)$, and therefore, $d \mid r$. In other words, $d$ is a common divisor of $b$ and $r$. Now consider an arbitrary common divisor $c$ of $b$ and $r$. If $c \mid b$ and $c \mid r$, we have $c \mid (qb+r) = a$, so $c$ is also a common divisor for $a$. This means we must have $c \leq d$. Therefore, $\gcd(b,r) = d = \gcd(a,b)$. $$\tag*{$\Box$}$$

This result gives us a way to compute the gcd: $$\gcd(a,b) = \begin{cases} a & \text{if $b = 0$,} \\ \gcd(b, a \mathop{\mathbf{mod}} b) & \text{otherwise.} \end{cases}$$

This algorithm is called Euclid's algorithm, named after Euclid who describes it in the Elements. Euclid is the Greek mathematician from around 300 BC who, in the Elements, describes a lot of the elementary geometry, algebra, and number theory that we learn in school.

Example 3.14. Suppose we want to compute the gcd of 232 and 72. We apply Euclid's algorithm as follows: \begin{align*} \gcd(232,72) &= \gcd(72,16) \\ &= \gcd(16,8) \\ &= \gcd(8,0) \\ &= 8 \end{align*}

This leads us to the notion of the existence of an Extended Euclidean algorithm, which we won't be covering, but makes use of Bézout's lemma. The EEA is an algorithm that computes the gcd of $m$ and $n$ and the coefficients $a$ and $b$ from Bézout's lemma. We can treat these coefficients as a certificate or sanity test that the gcd we computed was really correct without having recompute the entire thing later on. The EEA gives us a way to compute the gcd so that without doing a lot of extra work, $a$ and $b$ fall out almost for free.