CMSC 27100 — Lecture 2

Logic

Mathematical logic is the study of reasoning from the point of view of mathematics. It's a rich and deep field of study in its own right, but in the context of computer science, it's particularly important. The development of mathematical logic in the early 20th century leads directly to the first notions of computability and ultimately the birth of computer science.

For the purposes of this course, the basics of mathematical logic allow us to develop a language to express mathematical ideas and a framework in which to prove their validity. There are two parts to this.

We first describe how to express things. That is, how we can say something like our induction axiom, \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n)\] as well as what it means.

We then follow up with how to construct proofs about those statements. That is, how do we prove the validity of the sentence above? What does its structure say about how we should approach this?

Notice that we used informal, natural language to say things:

For all natural numbers $n$, $n+z = n$.

We want to (and can) express this more formally using mathematical logic. In the strictest sense, the language of mathematical logic is really just a collection of symbols and some rules about how to arrange them. So, you're allowed to write something that looks like this, but not that, and so on and so forth. The magic comes when we define meaning for these symbols.

Generally, we assume that we are working in classical logic. Here, our statements take on truth values: true or false. However, this is not the only interpretation we can take—we will see an alternative point of view when we discuss proof which has interesting connections to computer science. But generally for this course and for most mathematics, we use classical logic.

The basic element of logic is the proposition. A proposition is a statement that is either true or false. All mathematical results are propositions—they are statements that are either true or false. Even something as simple as $2+2 = 4$ is a proposition: the statement asserts that the addition of 2 and 2 is equivalent to 4. This happens to be true. But $2+2 = 5$ is also a proposition—one which is false.

Since all these statements are propositions, the names that we give to them really only describe their importance.

We call particularly important propositions that have been proven to be true theorems.
Less important propositions are sometimes called lemmas. We can think of lemmas as propositions that we don't really care about except in the service of proving more important theorems. But sometimes this isn't true and some lemmas turn out to be more important than we thought at the time.
A corollary is a proposition that follows immediately as a consequence of a theorem.
And sometimes propositions are just labelled propositions.

We can then combine several propositions together to create more interesting and complex propositions. This is done by using logical connectives, which modify the meaning of the propositions they combine in various ways.

Logical connectives

There are four basic logical connectives, also called Boolean connectives, after George Boole who defined them in 1854. We will define and consider each one.

The connective $\wedge$ is called conjunction, and the proposition $p \wedge q$ is pronounced "$p$ and $q$". The proposition $p \wedge q$ is true if and only if both $p$ and $q$ are true.

This connective expresses the idea that both $p$ and $q$ are true. The usual way you've likely seen this defined is via truth table.

$p$	$q$	$p \wedge q$
$T$	$T$	$T$
$T$	$F$	$F$
$F$	$T$	$F$
$F$	$F$	$F$

Truth tables are a representation for classical logic, which makes it seem like logical formulas are just for computing truth values. However, things become a bit more interesting if we consider the question of what a proof of $p \wedge q$ should look like.

The connective $\vee$ is called disjunction. The proposition $p \vee q$ and is pronounced "$p$ or $q$" and is true if and only if $p$ is true or $q$ is true or both.

$p$	$q$	$p \vee q$
$T$	$T$	$T$
$T$	$F$	$T$
$F$	$T$	$T$
$F$	$F$	$F$

One tricky thing to note with English is that many times we use "or" to mean "exclusive or". For instance, when you are asked whether you prefer beef or chicken, the expectation is that you may only choose one and not both. This logical connective allows for both $p$ and $q$ to be true, which corresponds to something like "and/or" in English.

The connective $\neg$ is called negation and the proposition $\neg p$ is pronounced "not $p$". The proposition $\neg p$ is true if and only if $p$ is false.

If $q$ is the proposition "I can take two weeks off", then the proposition $\neg q$ is "I cannot take two weeks off". In other words, when translating natural language to formal language, negation modifies the proposition. Notice that negation is a unary connective, in that it only applies to one proposition, while all the other connectives are binary. This also means that propositions stated negatively are considered to be positive propositions modified by negation.

These three connectives are ones you're probably most familiar with in the context of programming: they are the connectives that appear as boolean operators which you use to construct boolean expressions.

However, there is one more boolean connective which is used in mathematical logic and is arguably more important.

The connective $\rightarrow$ is called implication. The proposition $p \rightarrow q$ is pronounced "if $p$, then $q$" and $p \rightarrow q$ is true if and only if whenever $p$ is true, $q$ is true also.

In the proposition $p \rightarrow q$, we call $p$ the hypothesis and $q$ the conclusion.

This connective is often a bit trickier to think about because it's not commutative like the conjunction or disjunction. However, it's important because mathematical statements are phrased in this way: we state conditions as our hypothesis and we are guaranteed the conclusions as long as the hypothesis is true. We can see how this is caputred in the truth table.

$p$	$q$	$p \rightarrow q$
$T$	$T$	$T$
$T$	$F$	$F$
$F$	$T$	$T$
$F$	$F$	$T$

It is worth reasoning through each scenario.

Consider the propositions

$p$: I have obtained a computed grade of at least 90%.
$q$: I am assigned a grade of A.

In this case, $p \rightarrow q$ is "If I have obtained a computed grade of at least 90%, then I am assigned a grade of A."

If I have obtained a computed grade of at least 90% and I was assigned a grade of $A$, then $p \rightarrow q$ is true. That is, the hypothesis was satisfied and the conclusion was satisfied.
If I have obtained a computed grade of at least 90% but I was not assigned a grade of $A$, then $p \rightarrow q$ is false. The conclusion was false even though the hypothesis was satisfied.
If I have not obtained a computed grade of at least 90% and I was not assigned a grade of $A$, then $p \rightarrow q$ is true. Since the hypothesis wasn't satisfied, there's no need for the conclusion to be true.
If I have not obtained a computed grade of at least 90% but I was assigned a grade of $A$, then $p \rightarrow q$ is still true. Since the hypothesis wasn't satisfied, there's no need for the conclusion to be true, but it can be true since it doesn't matter.

Since $p \rightarrow q$ and $q \rightarrow p$ are not necessarily logically equivalent, we call $q \rightarrow p$ is called the converse of $p \rightarrow q$.

Note that the Boolean connectives are not the only logical connectives out there. There are a few more that are in common use in programming or computer engineering. It also isn't true that the Boolean connectives are necessary, in the sense that if you take one of the Boolean connectives out, there's a way to rewrite any proposition that uses the connective we took out by using a combination of the other three. In fact, we only need two of the Boolean connectives, or one bespoke connective to express everything we could using the four Boolean connectives.

Predicates and first-order logic

One problem that we run into very quickly is that propositional logic is not quite powerful enough for our needs. For instance, if I wanted to say something like

$x$ is odd and $x$ is less than $y$

then I can define propositions for them. For instance, $p$ can be "$x$ is odd" and $q$ can be "$x$ is less than $y$". This would give us the statement $p \wedge q$.

But this is not very satisfying or helpful. We see that there's no way to express the fact that the two different propositions are really talking about the same $x$. Why is that? The problem lies in the fact that propositional logic is about propositions but what we really care about are not the propositions themselves but what we can say about things: numbers, vectors, functions, cars, and so on.

So if we want a language to talk about things, we need to expand our language a bit further. We will be discussing predicate or first-order logic.

First, we need to determine a domain of discourse: what are the things that we want to talk about? This can be something like the natural numbers $\mathbb N$ or the integers $\mathbb Z$ or matrices or functions or graphs and so on. There's also nothing stopping us from making our domain the set of all dogs, if what we really wanted to do was express statements about dogs, or other "real-world" types of objects. (This occurs less in math but more in places like database theory.)

Then, we want to express properties and relationships about the objects in our domain. For this, we need to define predicates. Predicates are written $P(x_1, x_2, \dots, x_k)$ and can be thought of as statements about things $x_1$ through $x_k$.

Let's try to express $x$ is odd and $x$ is less than $y$ as a sentence in predicate logic. We will define the following predicates.

$P(x)$: "$x$ is odd"
$Q(x,y)$ "$x$ is less than $y$"

This is enough to get us something like $P(x) \wedge Q(x,y)$ for the example statement from above. Notice that we are able to say that $x$ is the same among the two statements and that we're able to relate $x$ and $y$. Observe also that many of the relations that we're used to expressing are really predicates. For instance $=$ is another example of a predicate.

But what are $x$ and $y$? At this point, they are both not anything at all and at the same time can be anything. This is because they are variables to which we have not given any meaning. They are still quite useful at this point because they give us a way to name things. But one way we can use them is to set them (using other predicates), say like $x$ is 3 and $y$ is 5.

However, we often want to say more general things than that. Since we have a domain of discourse, we often want to make claims about various objects in the domain. We want to say that all of the things have some particular property or that there are objects that have various properties. Such statements quantify over our domain, so we define quantifiers to talk about them.

The symbol $\forall$ is called the universal quantifier and is read "for all". The statement $\forall x P(x)$ is true if and only if $P(x)$ is true for every element $x$ in the domain.

The symbol $\exists$ is called the existential quantifier and is read "there exists". The statement $\exists x P(x)$ is true if and only if $P(x)$ is true for at least one element $x$ in the domain.

Basically, we use $\forall$ when we want to make a statement about all objects in our domain ("every dog is a good boy") and we use $\exists$ when we want to say that there is some object out there in our domain that has a particular property ("there is a dog that can play Jenga").

Suppose our domain is the integers $E(x)$ is the statement "$x$ is an even number". We can use predicates and quantifiers together with the logical connectives we introduced with propositional logic to say things like $\forall x (E(x) \rightarrow \neg E(x+1))$, which says that "for every integer $x$, if $x$ is even, then $x+1$ is not even".

We can define statements that relate more than one object. Suppose our domain is the natural numbers and $L(x,y)$ is the statement "$x$ is less than $y$". We can consider the following proposition: $$\forall x \exists y L(x,y).$$ This reads "for every natural number $x$, there exists a natural number $y$ such that $x \lt y$". This statement turns out to be true: every natural number does have some other natural number that it's less than. Now, let's consider the following similar statement, where the order of the quantifiers is reversed: $$\exists y \forall x L(x,y).$$ Here, we're saying that there is a natural number $y$ for which every natural number $x$ is less than it. In other words, this says that there is a "biggest" natural number, which is obviously not true. As we can see, the order of the quantifiers matters.

A point about notation here. Notice that we write $\forall x P(x)$, which leaves the domain implicit. This is technically the correct way to write this—the domain of discourse is fixed beforehand. However, you may sometimes see such statements written as $\forall x \in \mathbb R, P(x)$, if the writer wanted to use $\mathbb R$ as the domain. This is not really necessary if the domain is $\mathbb R$, but what if the domain wasn't? What was really attempted then was shorthand for something like $\forall x (R(x) \rightarrow P(x))$, where $R(x)$ would be a predicate for "$x$ is a real number".

Divisibility

We'll be exploring proof via elementary number theory, which is the study of the integers. In particular, we will be discussing divisibility. Division on integers is an interesting operation even though we've likely been dividing numbers since childhood without thinking about it too much. However, where it does come back to trip us up again is when we start to learn how to program. This usually comes in the form of trying to divide two numbers and then remembering that dividing two integers doesn't always give you an integer.

Let $a$ and $b$ be integers. We say that $a$ divides $b$, written $a \mid b$, if and only if there exists an integer $n$ such that $a \cdot n = b$. We also say that $b$ is a multiple of $a$.

We can try to express this formally. Our domain is the integers and we assume that we know basic properties or axioms about integers, like how we know they're equal and how we can multiply them.

We have two integers, $a$ and $b$. Notice that $a \mid b$ is really a predicate: it's a statement about $a$ and $b$. We would like to relate this predicate to more basic operations—in particular, that there's an integer $n$ for which we have $a \cdot n = b$. Equality is also a predicate, which is comparing two things, $a \cdot n$ and $b$. Since this is an existential statement, we can write this as \[\exists n (a \cdot n = b).\]

Now, we want to define the predicate $a \mid b$ so that it is true if and only if there is an integer $n$ for which $a \cdot n = b$.

The connective "if and only if" is not one of the connectives we introduced, but we can construct it using what we've already seen. We say that $p$ if and only if $q$ whenever $p \rightarrow q \wedge q \rightarrow p$. Sometimes this is also written $p \leftrightarrow q$. This means our definition can be rendered formally in the following way: \[a \mid b \leftrightarrow \exists n (a\cdot n = b).\]

Proof

A proof is a proof. What kind of a proof? It's a proof. A proof is a proof, and when you have a good proof, it's because it's proven.
—Rt. Hon. Jean Chrétien, 20th Prime Minister of Canada

So far, we have covered how to say things, but ultimately, what we want to do is verify whether a statement is true or not. This is done by means of proof. Informally, a proof is really just an argument that a statement is valid based on some premises.

In mathematics, the premises are a set of facts, like "For all natural numbers $m$ and $n$, $m+n = n+m$" (that is, addition on the natural numbers is commutative). In very formal settings, we are extra careful to specify exactly which facts we "know" already—these are called axioms. Then, if we apply those facts in certain ways which are described by some rules, we can arrive at some statement, which we know must be true because we followed the rules.

For example, consider the following proposition.

For all integers $a$, $b$, and $c$, if $a \mid b$, then $a \mid bc$.

Within this statement, there are some facts that we need to know already in order to work with if we want to "prove" this statement. We must know facts about integers and the predicate for divisibility, $\mid$.

What then is a formal proof in this context? This is something that the mathematcians of the late 19th century tried to nail down using logic. We might render the above statement as something like \[\forall a,b,c(a \mid b \rightarrow a \mid bc).\] What constitutes a proof of this? Well, we need a way to "prove" a statement that's universally quantified and how to deal with an implication. In other words, the form of the statement determines the rules that we apply to come up with our argument, or proof.

Notice that if we have everything set up, we don't actually need to "know" what our predicates "mean", since the validity of the proof is determined totally by the form of the sentence. What this means is that in the most formal versions of this process, we get to the level of manipulation of symbols, akin to string rewriting. It's not hard to see that when we get to this level, a proof really becomes something like a program, with strict rules that can be checked and validated.

We want to approach less formally, so what we are trying to do is begin with a set of statements that represent facts that we already know and try to apply rules to these facts so that we can construct a "true" statement by the end of the process. Any statement that we can arrive at by following these rules is considered "valid" and the steps that we take constitute the proof.

Let's consider the proposition $p \wedge q$. Earlier, we were given the truth table for this statement, but what would a proof of such a proposition look like? Well we know that in order for $p \wedge q$ to be true, it has to be the case that $p$ is true and that $q$ is true. But how do we know that $p$ and $q$ are both true? Well, if we had a proof for $p$, we would know that it's true and if we had a proof for $q$, we would know it's true. That is, if we have a proof of $p$ and we have a proof of $q$, then we are able to construct a proof of $p \wedge q$. We can express this idea as a rule, written \[\frac{p \quad q}{p \wedge q} \wedge\!i.\]

This is an example of an inference rule. The rule we discussed above is called $\wedge$-introduction, since it creates a conjunction.

Inference rules are rules that govern how we can construct proofs for our statements based on what we have already proved. We can define a whole bunch of these inference rules and by using these rules, we can manipulate formulas to acquire other formulas. Inference rules are often specified in the following way: \[\frac{\alpha_1 \quad \alpha_2 \quad \cdots \quad \alpha_n}{\beta}.\]

This rule says that if I know that statements $\alpha_1, \alpha_2, \dots, \alpha_n$ are valid, then I can conclude that $\beta$ is also valid. A statement $\alpha_i$ is valid if we already know it beforehand (i.e. it is a known, given fact) or if it shows up at some earlier point in the proof. The reason for this is that since we were able to get to $\alpha_i$ at some point earlier in the proof, we must have obtained it by following our proof rules. Therefore, if we stopped at the point that we got $\alpha_i$, we would have a proof if $\alpha_i$.

Inference rules define a proof system. The proof system that we'll be referring to is natural deduction, which was defined by Gerhard Gentzen in the 1930s, and contains rules that correspond to common patterns of proof used by human mathematicians. There are other proof systems which are geared for other purposes, such as use with computers or the study of proof systems.

We won't actually be learning or using natural deduction to prove things. Instead, we'll be walking through basic proof techniques and noting how and where they correspond to rules that are defined in natural deduction.

Here's a particularly important inference rule (quite possibly the first!): $$\frac{p \rightarrow q \quad p}{q} \rightarrow\!e.$$

This inference rule is called $\rightarrow$-elimination. What does it mean? It's an argument of the form

If the bus is late, then I will miss my transfer.
The bus is late.
Therefore, I will miss my transfer.

This is classical logic. But let's consider it from the perspective of proof. What this says is that if we have a proof of the proposition $p \rightarrow q$ and we have a proof of $p$, we are able to construct a proof for $q$.

For instance, the following argument follows the same structure:

If $m$ and $n$ are even numbers, then $m+n$ is an even number.
$m$ and $n$ are even numbers.
Therefore, $m+n$ is an even number.

We can rewrite this so that it's even more abstract.

If $p$, then $q$.
$p$.
Therefore, $q$, by $\rightarrow\!e$ (1,2).

Notice that this is just a linear form of our inference rule from above. There's nothing that says that $p \rightarrow q$ and $p$ have to appear in this order. Also note that in very formal proofs, we cite the rule we're applying along with which lines we're applying the rule to.

One thing to notice that the content of the argument has no bearing on whether it's valid or not. The only thing that matters is whether we have followed the rules to get from one statement to the next. This is not unlike how many people try to argue on the internet.

Proof by induction

Let's start trying to parse the sentence that describes our induction axiom and come up with a proof for it. Recall that we have the following sentence given to us \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n).\]

Our goal is to prove $\forall n P(n)$. Notice that this is a sentence of the form $p \rightarrow q$ and we want to prove $q$. This means that using the second of the rules we saw, we know that what we need to do is prove $p$, which is \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))).\]

Now, observe that this is a conjunction: it is a sentence of the form $p \wedge q$. So this means we need a proof for $p$, which in our case is $P(z)$ and a proof for $q$, which is $\forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$.

This is exactly the template for induction:

Base case. Prove the statement $P(z)$.
Inductive case. Prove the statement $\forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$:

We end up with a proof tree that looks like \begin{prooftree} \AxiomC{$P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n)$} \AxiomC{$P(z)$} \AxiomC{$\forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$} \BIC{$P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$} \BIC{$\forall n P(n)$} \end{prooftree}

The base case, $P(z)$, is fairly straightforward to prove—it's a single predicate, so we can prove this directly. However, in order to prove the inductive case, $\forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$, we need to continue to peel back pieces, starting with how to prove universally quantified statements.

Universal quantification

A universally quantified statement is one of the form $\forall x P(x)$. At first glance, it seems like we have a lot of work to do, because we have to show that this statement holds for every $x$ in our domain. One way we can attempt this is to go through every element in our domain and see if we have a proof for each one. This is bad because our domain may be infinitely large (as is the case with natural numbers.

Instead, what we do is "pick" a generic representative and continue our argument with that representative. Since we just choose an arbitrary element with nothing special about it, we can conclude that our statement must hold for any element from the domain we've chosen, because we could've picked any of them.

Recall the statement that begins the inductive case for the proof of Proposition 1.9 from last class:

If $n = \operatorname{succ}(k)$ for some natural number $k$ $\dots$

The phrasing is a bit different, but the idea is that we are picking a generic natural number $k$, from which we discuss the rest of the inductive case. Since we don't know anything about $k$, we can conclude that whatever we say will hold for any natural number.

What does the inference rule for this look like? We call this rule $\forall$-introduction: \[\frac{\boxed{\begin{matrix} \text{$u$ arbitrary} \qquad \\ \vdots \\ P(u) \end{matrix}}}{\forall x P(x)} \forall i\] Why is there a box? This notational style is due to Jaśkowski in 1934 and is a way to show that $u$ exists only inside the box. You can think of the box as the "scope" of $u$ and that $u$ can't be used outside of the box. (Notice how "scope" is a concept that has its origins in mathematical logic)

This rule looks complicated (and the notation admittedly is), but it actually tells us exactly what a proof for $\forall x \in S, P(x)$ looks like.

First, we pick a representative domain element and give it a name, say $u$. This is done by saying something like "Let $u \in S$ be arbitrary", where $S$ is our domain.
Then we complete a proof using $u$.
We then conclude that $P(u)$ holds.
Since $u$ was arbitrary, we can conclude that $\forall x P(x)$ holds.

Once we apply this rule, we are cleared to consider the sentence that has been quantified. In our case, this is $P(k) \rightarrow P(\operatorname{succ}(k)$, which is an implication.