CMSC 27100 — Lecture 1

Discrete mathematics and computer science

Discrete mathematics is the mathematics of discrete structures. What do we mean by discrete structures? By discrete, we mean things that can be thought of as distinct chunks or units, like whole numbers and counting. That sounds pretty elementary and it's an interesting quirk that the educational system seems to be eager to shuffle us off into things like fractions, "decimals", lines, and so on. These things are not discrete: lines are typically smooth and this smoothness is such a desirable feature that it plays a big role when we eventually get to calculus. Calculus is very concerned about continuous functions, which are very not discrete.

But why should we be concerned with discrete mathematics as computer scientists? There are a number of different levels from which we can examine this relationship. Here is a summary to start.

  1. We abstract and model problems that we want to solve using computer science as math problems, which are often discrete.
  2. We represent data in computer science as discrete structures.
  3. Proofs are solutions to math problems and programs are just very highly/formally specified proofs.
  4. We analyze our definitions and solutions by proving properties about them.

The most immediately obvious relationship is what happens when we ask students to complete a programming exercise in a class like CMSC 14100. For many people, programming has nothing to do with mathematics—after all, you're probably in this class after finishing CMSC 14100. Whether or not you took that class, we can think about what the problem solving process for a beginner programmer might look like. In the best case, we systematically think through modelling and abstracting our problem into a math problem and then try to solve that problem. What usually happens is that we throw ourselves against a wall trying things out until it works.

When people say that programming doesn't need math, this is what they're describing. You don't strictly need to know math, because you can just brute force your way into getting your program working. That's fine, but eventually, one gets tired of smashing their head against walls so much and you'd begin to wonder whether there was a better way.

This is the first level at which we can think about the connections between computer science and math. When we say that we want to solve problems using computer science, what we really want is a program that solves a mathematical problem that represents our "real" problem. Knowing more math gives us more options when mapping "real" problems to math problems.

Discrete structures and representation

What's the next step? At the top level, we model "real" problems onto mathematical problems. However, we then need to go a bit deeper and think about the structures that we use in order to this. That is, how is our data represented? In programming, we are introduced to the idea of types like integers, strings, and lists and even more complex types like trees. But these are discrete structures!

Here is the next idea: that many of the structures and representations in computer science are discrete. Hence, the need to understand the mathematics of discretre structures.

Let's start by talking about one of the discrete structures that we'll be examining in more detail later: the natural numbers. You may be familiar with the natural numbers already as "whole numbers" or maybe slightly more formally as "nonnegative integers" (we consider 0 to be a natural number, though this is disputed by various kinds of mathematicians).

For instance, here is a definition you will likely have run across recently as you began to learn how to program.

Let $n$ be a natural number. The factorial $n!$ is defined by \[n! = n \times (n-1) \times (n-2) \times \cdots \times 2 \times 1.\] This makes for a simple programming exercise.

def factorial(n: int) -> int:
    result: int = 1
    for i in range(1,n+1):
        result = result * i
    return result

However, you may also recall that the factorial function can be written recursively.

def factorial(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

This function comes from the more formal definition for the factorial.

Let $n$ be a natural number. The factorial $n!$ is defined as follows:

Notice that this definition itself is already an algorithm. We can compute the factorial simply by using substitution and the definition.

What's $3!$? According to the definition, $3! = 3 \times 2!$. Okay, so what's $2!$? According to the definition, $2! = 2 \times 1!$. But what's $1!$? According to the definition, $1! = 1 \times 0!$. But what's $0!$? According to the definition, $0! = 1$. Putting it all together, we have the following. \begin{align*} 3! &= 3 \times 2! \\ &= 3 \times (2 \times 1!) \\ &= 3 \times (2 \times (1 \times 0!)) \\ &= 3 \times (2 \times (1 \times 1)) \end{align*}

Why does this definition work? We recall that recursive functions are computations on recursive data. So the definition of the factorial gets its structure from the formal mathematical definition for the natural numbers, due to the Italian mathematician Giuseppe Peano in the late 19th Century.

The natural numbers are defined as follows:

We denote the set of natural numbers by $\mathbb N$.

Here, we have an element that we've just called $z$—this happens to be the thing that we call zero or 0. Zero is a natural number. Here I've made a deliberate choice not to use 0 to make it clear there's nothing particularly special about this element. Then we have the function $\operatorname{succ}$, which is called the successor function.

This definition for the natural numbers (and the definition for factorial from above) is an example of a recursive or inductive definition. These are definitions that are defined in terms of itself. Let's dissect this.

The first part of the definition is called the base case. The base case defines the most basic element of the definition. Often, one can think of this as kind of the "smallest" or "least" case. The second part of the definition is called the recursive or inductive case. This is the part of the definition that refers to itself.

Surprisingly, even though this defintion is recursive, we can hammer out a definition of the type (yes, even in Python! crazy!) in code just like before.

@dataclass
class Succ:
    p: Natural

Natural = None | Succ

Here, we define the type Natural to be

In case you really want to try working with this yourself, here are some notes on implementation that are not that important to the mathematics:

The key idea here is that types in computer science and structures in math are often the same thing. The reason that we can have data structures in CS is because they have formal mathematical definitions.

It so happens that all of the numbers we already know and love can be described using this definition. "Zero" or "0" we already said was $z$. Then "one" or "1" is $\operatorname{succ}(z)$. "Two", or "2", is $\operatorname{succ}(\operatorname{succ}(z))$, and so on. But notice that we never actually defined these other natural numbers: the only natural number we explicitly defined was zero. Every other number is expressed as the successor of another number.

In a way, we might think of this as naming constants like

zero = None
one = Succ(zero)
two = Succ(one)
three = Succ(two)
    ...

Is $3$ a natural number? Here, we're using $3$ as a shorthand for \[\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))).\] So how do we know if $\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)))$ is a natural number? Well, we have to figure out if it's the successor of a natural number, or if the argument to $\operatorname{succ}$ is also a natural number. So we look at $\operatorname{succ}(\operatorname{succ}(z))$ and see if that's a natural number. Again, this depends on the argument to $\operatorname{succ}$, so we check to see if $\operatorname{succ}(z)$ is a natural number. We go through the same argument and check if $z$ is a natural number. It is! So we can conclude that $3$ was a natural number after all.

Definitions and algorithms

So what can we do with this? One of the first things we learn when we learn about numbers is how to add them together, so let's do that. Now, we typically think of adding in terms of combining numbers of things together. However, we need to try to do this without that kind of intuition, because as we discussed before, computers do not know that putting three apples and five apples together makes eight apples.

For natural numbers $m$ and $n$, we define addition on the natural numbers, denoted $m + n$, as follows:

Let's think about this definition a bit. We want to add two natural numbers, $m$ and $n$. What should be the outcome? Well, let's think about the first one, $m$. Recall our definition for natural numbers: it's either $z$ or the successor of some other natural number.

Okay, so if $m = z$, that is $m$ is zero, then adding it to $n$ should just produce $n$ (at this point, I need to point out that we didn't really define zero's properties, but let's set that aside for later).

But what if it's not $z$? Then $m = \operatorname{succ}(p)$, where $p$ is some other natural number. At this point, we have a few pieces to work with: we have the natural numbers $m$, $p$, and $n$ (and $m$ and $p$ are related), and we have two functions, $\operatorname{succ}$ and $+$.

The key here is to think about what $p+n$ is in relation to $m+n$. We know that $p$ is the number that comes "before" $m$, which means that $p+n$ should be the number that comes before $m+n$. And so, we arrive at $m+n = \operatorname{succ}(p+n)$.

Let's try to do some simple math: what's $2+3$? By following our definitions from above, we get: \begin{align*} 2+3 &= \operatorname{succ}(\operatorname{succ}(z)) + \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) \\ &= \operatorname{succ}(\operatorname{succ}(z) + \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)))) \\ &= \operatorname{succ}(\operatorname{succ}(z + \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))))) \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))))) \\ &= 5 \end{align*}

Another way of looking at this is to use the usual notation for numbers and see that our definition says that $m+n$ is defined in the following way:

Now recall that all of these inductive definitions also tell us how to compute these things so we get a nice algorithm.

def plus(m: Natural, n: Natural) -> Natural:
    match m:
        case None:
            return n
        case Succ(p):
            return Succ(plus(p,n))

Now, we can continue this line of thinking and reproduce many basic results on natural numbers (for example, how do we define multiplication?). However, let's observe and reiterate a point that was made in early CS classes. We arrived at our algorithm from the mathematical definition that we gave. And we arrived at our definition based on the definition of the structure that we're working with.

That is, the problems that we're solving and the algorithms that we use to solve them are determined by the structures that we use to represent our data. This is very easy to see when working with simple recursive types like naturals, lists, and even trees. However, this is still true for more complex data, like objects.

Describing properties

Finally, how do we know that what we've defined is correct? Well, we don't—we can define whatever we want. But what we might care about is whether what we've defined is interesting (what a mathematician might say) or useful (what a computer scientist might say). To do so, we might want to know some things about our definition: what properties does it have?

For example, we don't really know anything about addition yet but we might have some guesses. For example, we might want to know that adding zero to any number $n$ gives us $n$. Formally, we call $z$ an additive identity. This is very easy to see.

For all natural numbers $n$, $z+n = n$.

How do we know this is true? Well, this is exactly the definition of addition! We can formalize this with a proof.

Let $n$ be an arbitrary natural number. Then by Definition 1.6(2), $z + n = n$.

Okay, very simple. But what if we wanted to know something like the following?

For all $n \in \mathbb N$, $n+z = n$.

Notice that we can't appeal to our definition anymore because the numbers aren't in the right order. Now, you might say that it's obvious that $a+b = b+a$, but we don't actually know this from our definition!

However, this seems to be true. How can we convince ourselves that this is actually true? We might try a few examples to test it.

If $n = z$, we have $z + z = z$ by Definition 1.6(1).

If $n = \operatorname{succ}(z)$, we have \begin{align*} \operatorname{succ}(z) + z &= \operatorname{succ}(z + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(z) &\text{Definition 1.6(1)} \end{align*}

If $n = \operatorname{succ}(\operatorname{succ}(z))$, we have \begin{align*} \operatorname{succ}(\operatorname{succ}(z)) + z &= \operatorname{succ}(\operatorname{succ}(z) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(z + z)) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(z)) &\text{Definition 1.6(1)} \end{align*}

If $n = \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)))$, then we have \begin{align*} \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z) + z)) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z + z))) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) &\text{Definition 1.6(1)} \end{align*}

So, this looks a lot like our computation for the factorial, in that it is very repetitive. However, that is the key to our proof: the observation here is that our computation of $n+z$ depends on the computation of whatever the number that comes before $n$ is. By the first line of our computation, we have that \[\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z = \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z)\] but we've already proven what $\operatorname{succ}(\operatorname{succ}(z)) + z$ is! We could instead write something like \begin{align*} \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) &\text{Proof of $\operatorname{succ}(\operatorname{succ}(z)) + z$} \end{align*}

What would be convenient is if there were a way to use this idea: that we know that if we can prove a property $P$ for a natural number $n$, we can use that proof to prove that $P$ also holds for $\operatorname{succ}(n)$. This is the idea behind the principle of mathematical induction.

We will prove this by induction on $n$.

Base case. If $n = z$, then \begin{align*} n + z &= z + z &\text{substitution} \\ &= z &\text{Definition 1.6(1)} \\ &= n &\text{substitution} \end{align*} Therefore, our claim holds for the base case.

Inductive case. If $n = \operatorname{succ}(k)$ for some natural number $k$, then assume that $k+z = k$. We will show that $\operatorname{succ}(k) + z = \operatorname{succ}(k)$. Consider the following: \begin{align*} n + z &= \operatorname{succ}(k) + z &\text{substitution} \\ &= \operatorname{succ}(k + z) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(k) &\text{inductive hypothesis} \end{align*} Therefore, our claim holds for the inductive case and therefore, $n + z = n$ for all natural numbers $n$.

We'll go through proof by induction more carefully soon. For now, just observe that like our discussions of recursive functions and recursive data types, the structure of the proof follows the structure of the recursive definition.

But why are we allowed to do this?

Mathematical induction happens to be an axiom of the Peano definitions for natural numbers. We can state this for a property $P$ in first-order logic as \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n).\] What this tells us is that if we can show that the hypothesis, $P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$, is true, then we can conclude that $\forall n P(n)$.

But how do we know or do this? In order to dissect this further, we need to talk about mathematical logic and proof.