MPCS 50103 — Lecture 10

Recurrence Relations

One of the things that makes analysis of algorithms and similar kinds of counting problems tricky is when we are confronted with a recursive procedure. In this case, counting the number of steps of an algorithm or other similar objects requires describing the term of a function based on the function itself applied to previous terms. These are called recurrence relations and we will touch upon a few ways to deal with them.

Example 22.1. There are a few classic examples where recurrences arise. The first of these is the Tower of Hanoi, where there are a stack of discs of increasing size that you have to move between three pegs, you being one of several Tibetan monks/Hindu priests, depending on which telling of the story we're dealing with. You want to move the stack of discs to another peg with the condition that the order of the stack is preserved. Furthermore, you can only move one disc to another peg at a time and you can't place a larger disc on top of a smaller disc. Upon the placement of the final disc to complete the task, the world will end.

Suppose we have $n$ discs (we have generalized the problem but the monks/priests only have to deal with 64 discs). How many times do we have to move a disc in order to move the entire stack to another peg?

To count this, let's think recursively. Let $H_n$ denote the number of moves required to move the top $n$ discs from one peg onto another. In order to move the final, largest disc, we need to have moved the stack of $n-1$ discs onto another peg, so this is $H_{n-1}$ moves. Then, we move the largest disc onto the remaining open peg. Once we have done this, we move the $n-1$ discs so that they go on top of the largest disc, which is another $H_{n-1}$ moves. In total, we have made $H_n = 2H_{n-1} + 1$ moves to move $n$ discs.

Now, we've developed a recurrence relation, but we want a closed form for it. This leads us to the first technique to solving recurrences: try some stuff and hope it works and if it does, use induction to cook up a formal proof. This is not too different from when we had to guess the closed form of a sum back when we were discussing induction. This involves looking at the first few terms and trying to divine something out of the sequence we get. First, we set the initial conditions by observing that $H_0 = 0$ and $H_1 = 1$. Then we have \begin{align*} H_0 &= 0 \\ H_1 &= 1 \\ H_2 &= 3 \\ H_3 &= 7 \\ H_4 &= 15 \\ H_5 &= 31 \\ H_6 &= 63 \\ \end{align*}

Here, we notice this looks suspiciously like $2^n - 1$, so we can test this hypothesis by substituting this into the formula: $$H_n = 2H_{n-1} + 1 = 2 \cdot (2^{n-1} - 1) + 1 = 2^n - 2 + 1 = 2^n - 1.$$ This appears to work, so if we want to go ahead and prove this formally, we have all the pieces to construct a proof by induction. This is called the substitution method.

But what if we're faced with a sequence that we may not readily recognize? We can try an alternative approach that is pretty much the same as guessing, but from the other direction: expand the first few iterations of the recurrence. If we do this to our Tower of Hanoi recurrence, we get the following: \begin{align*} H_n &= 2H_{n-1} + 1 \\ &= 2(2H_{n-2} + 1) + 1 \\ &= 2^2 H_{n-2} + 2 + 1 \\ &= 2^3 H_{n-3} + 2^2 + 2 + 1 \\ &= 2^4 H_{n-4} + 2^3 + 2^2 + 2 + 1 \\ &= 2^5 H_{n-5} + 2^4 + 2^3 + 2^2 + 2 + 1 \end{align*} and so on. Again, this involves hoping that you will be able to see some sort of pattern emerge, and if you have a keen eye, you can deduce that the last few terms of this expansion will look like \begin{align*} H_n &= 2H_{n-1} + 1 \\ &= 2(2H_{n-2} + 1) + 1 \\ & \vdots \\ &= 2^{n-1} H_1 + 2^{n-2} + \cdots + 2^2 + 2 + 1 \\ &= 2^{n-1} + 2^{n-2} + \cdots + 2^2 + 2 + 1 \\ &= 2^n - 1 \end{align*} at which point, you can test your hypothesis further or cook up an inductive proof to prove this formally. This is called the iteration method.

And how long do the monks/priests have to go at this for before the end of the world approaches? The story assumes one move of a disc to take one second, giving us $$H_{64} = 2^{64} - 1 = 18446744073709551615$$ seconds to complete the movement of 64 discs. That's about 585 billion years, which is about 42 times the age of the universe, 130 times the age of the sun, and about as much of Unix time we can keep track of with an unsigned 64-bit integer.

Example 22.2. The other classical example of recurrences is a sequence that we've seen before already: the Fibonacci numbers. The origin of the Fibonacci numbers concerns modelling pairs of immortal rabbits mating. Do not make the mistake I made when I first learned about this and think that this was counting the number of rabbits rather than the number of pairs, because it does not make sense and is kind of weird if you think about it. The idea is that a pair of rabbits mate and give birth to a pair of rabbits who take one timestep to become mature, after which they can start mating. This gives us the sequence $$1,1,2,3,5,8,13,21,34,55,89,\dots$$ which we hopefully recognize as the Fibonacci numbers. At each time $n$, the number of pairs of rabbits is the number of pairs of rabbits that were alive at time $n-1$ and the number of newborn pairs. But we know that there are only as many newborn pairs as there are mature pairs, which is the number of pairs that were alive at time $n-2$. So we get $$f_n = f_{n-1} + f_{n-2}.$$

How do we solve this? Let's put that on hold while we take a look at another example.

Example 22.3. Let's think about a problem similar to the kind that I've been throwing at you: How many binary strings of length $n$ do not contain $00$ as a substring?

We could attempt to use one of our previous counting techniques to solve this one, but recall that strings are also recursively defined structures. It is helpful to count such strings in terms of how they're extended. In other words, we can consider the number of strings of length $n$ that don't contain $00$ in terms of the number of such strings of length $n-1$. This is because every string of length $n$ is a string of length $n-1$, but with an additional $0$ or $1$.

Let $A_n$ be the set of strings of length $n$ that avoid $00$ and let $a_n = |A_n|$. If $w = x1 \in A_n$, then $x \in A_{n-1}$. However, if $w = x0 \in A_n$, then $x$ itself cannot end in $0$. Therefore $x = y1$ where $y \in A_{n-2}$ and so $w = x10$. Putting these together, we find that $a_n = a_{n-1} + a_{n-2}$ for $n \geq 3$. Furthermore, we've seen this recurrence already. This leaves $n = 1,2$, which are easy to solve by enumeration.

As promised, we'll tackle how to solve these in just a bit. What we should note is that we are lucky and the recurrence is in one of the forms that allows us to apply a theorem to mechanically solve it, rather than have to guess and hope for the best.

There are various forms of recurrences that you'll encounter that have associated theorems that tell you if you have one of these recurrences, then you have a solution that looks like that. For instance, one of the most well-known of these in algorithms classes is the Master theorem, popularized by Cormen, Leiserson, Rivest, and Stein in their very famous algorithms textbook Introduction to Algorithms, popularly referred to as CLRS.

Theorem 22.4 (Master Theorem). Let $T(n)$ be the recurrence $T(n) = a T(\frac n b) + f(n)$, where $a,b \geq 1$ and $f:\mathbb N \to \mathbb N$. Then,

If $f(n) = O(n^{\log_b a - \varepsilon})$ for some $\varepsilon \gt 0$, then $T(n) = \Theta(n^{\log_b a})$.
If $f(n) = O(n^{\log_b a})$, then $T(n) = \Theta(n^{\log_b a} \log n)$.
If $f(n) = \Omega(n^{\log_b a + \varepsilon})$ for some $\varepsilon \gt 0$ and there exists $c \lt 1$ such that for all sufficiently large $n$, $af(\frac n b) \leq cf(n)$, then $T(n) = \Theta(f(n))$.

This is called the Master Theorem because the idea was that your recurrence was very likely to be in one of these forms, so you could then immediately apply the theorem to solve your recurrence.

Why does this recurrence show up a lot in algorithms? This has to do with how we solve problems by splitting them up into smaller problems and combining the solutions to those smaller problems. If we have a problem of size $n$, where $T(n)$ is the time it takes to solve a problem, we then split up our problem into $a$ subproblems, each of size approximately $\frac n b$. This gives us the recursive part of the relation, $a T(\frac n b)$. Then, for each problem, there is some non-recursive work that we do, which takes $f(n)$ time.

Linear Recurrences

We will be looking at a different kind of recurrence for which we know the form of the solution.

Definition 22.5. A linear homogenous recurrence of degree $k$ with constant coefficients is of the form $$a_n = c_1 a_{n-1} + c_2 a_{n-2} + \cdots + c_k a_{n-k},$$ where $c_i \in \mathbb R$ for $1 \leq i \leq k$ and $c_k \neq 0$.

So the Fibonacci recurrence ($f_n = f_{n-1} + f_{n-2}$) is linear and homogenous and has constant coefficients, but the Tower of Hanoi recurrence $(H_n = 2H_{n-1} + 1)$ is not homogenous, because of the constant term. Similarly, a recurrence like $a_n = a_{n-1}^2 + a_{n-2}$ would not be linear and a reccurrence like $a_n = a_{n-1} + n a_{n-2}$ does not have constant coefficients.

Unsurprisingly, these simple looking recurrences appear frequently in mathematics. Conveniently for us, the solution to these recurrences is of a very specific form. First, we'll need the following idea.

Definition 22.6. The characteristic equation for the linear homogenous recurrence of degree $k$ with constant coefficients $a_n = c_1 a_{n-1} + c_2 a_{n-2} + \cdots + c_k a_{n-k}$ is $$x^k - c_1 x^{k-1} - c_2 x^{k-2} - \cdots - c_{k-1} x - c_k = 0.$$

What we'll see is that the the solution of our linear homogenous recurrences are based on the roots of their characteristic equations.

Theorem 22.7. Suppose that the characteristic equation for a linear homogenous recurrence of degree $k$ has distinct real roots $r_1, r_2, \dots, r_k$. Then the solutions to the recurrence take the form $a_n = b_1 r_1^n + b_2 r_2^n + \cdots + b_k r_k^n$ for real constants $b_1, \dots, b_k$.

There is a proof of this in Rosen for $k = 2$ which is too long to cover in class. However, we can walk through the solution to gain some insight into the connection between linear recurrences and the roots of their characteristic equation.

Example 22.8. Consider the recurrence $a_n = 7a_{n-1} - 12a_{n-2}$, with $a_0 = 3, a_1 = 10$.

Without the knowledge of the theorem that tells us the exact form of the recurrence, we would be forced to divine some sort of pattern from the first few terms of the recurrence. What we would observe is something roughly exponential, so we would guess something of the form $r^k$. Plugging this into our recurrence, we get \begin{align*} r^n &= 7r^{n-1} - 12r^{n-2} \\ r^n - 7r^{n-1} + 12r^{n-2} &= 0 \\ r^{n-2}(r^2 - 7r + 12) &= 0 \\ r^{n-2}(r-3)(r-4) &= 0 \end{align*} Here, we begin to see where our solution might have come from. It's not hard to see how linear recurrences like this will have exponential solutions of some sort. The rest is just working out the details: what is the exact form of the solution that will give us the right pattern with the correct initial conditions? Guessing got us on the right track, since we know our solution could involve $3^n$ and $4^n$.

Now, let's apply the theorem. The characteristic equation for $a_n = 7a_{n-1} - 12a_{n-2}$ is indeed $x^2 - 7x + 12 = 0$, just as we had guessed, and it has roots $x = 3,4$. This means that our solution is a linear combination of the two exponential terms $a3^n + b4^n$. To solve for $a$ and $b$, we use our initial conditions $a_0 = 3, a_1 = 10$ to set up the equations \begin{align*} a3^0 + b4^0 &= 3 \\ a3^1 + b4^1 &= 10 \\ \end{align*} Some easy algebra gives us $a = 2, b = 1$. Therefore, the solution to our linear recurrence is $$a_n = 2 \cdot 3^n + 4^n.$$

Example 22.9. As promised, we can finally derive the closed form for the Fibonacci numbers. Recall that the Fibonacci numbers are described by the recurrence $f_n = f_{n-1} + f_{n-2}$, which is a linear homogenous recurrence of degree 2. The characteristic equation of this recurrence is $x^2 - x - 1 = 0$. This equation has roots $$\varphi = \frac{1 + \sqrt 5}{2} \text{ and } \widetilde \varphi = \frac{1 - \sqrt 5}{2}.$$ Then we know that the recurrence has solutions of the form $$a \left( \frac{1+\sqrt 5}{2} \right)^n + b \left( \frac{1-\sqrt 5}{2} \right)^n$$ and we need to consider the initial conditions to solve this. So we have \begin{align*} a \left( \frac{1+\sqrt 5}{2} \right)^0 + b \left( \frac{1-\sqrt 5}{2} \right)^0 &= 0 \\ a \left( \frac{1+\sqrt 5}{2} \right)^1 + b \left( \frac{1-\sqrt 5}{2} \right)^1 &= 1 \end{align*} which gives us $a = \frac{1}{\sqrt 5}$ and $b = - \frac{1}{\sqrt 5}$. Thus, $$f_n = \frac 1 {\sqrt 5} \left( \frac{1+\sqrt 5}{2} \right)^n - \frac 1 {\sqrt 5} \left( \frac{1-\sqrt 5}{2} \right)^n = \frac{\varphi^n - \widetilde \varphi^n}{\sqrt 5}.$$ This is Binet's formula, which we saw several weeks ago.

There is one complication that we haven't covered yet. What happens if we have a characteristic polynomial that has multiples of the same root? It doesn't really make sense to have a solution with two terms $r^n + r^n$. The big question is how to separate these two terms. In short, the answer is to separate terms by a factor of $n^i$. In other words, for $k$ repeated roots, we have terms of the form $n^0 r^n = r^n, nr^n, n^2 r^n, \dots, n^{k-1} r^n$. This is more formally stated in the following theorem.

Theorem 22.11. Suppose that the characteristic equation for a linear homogenous recurrence has distinct real roots $r_1, \dots, r_t$ with multiplicities $m_1, \dots, m_t$ respectively such that $m_i \geq 1$ and $m_1 + \cdots + m_t = k$. Then the solutions to the recurrence take the form $$a_n = \sum_{i = 1}^t \sum_{j = 0}^{m_i - 1} n^j b_{i,j} r_i^n$$ for real constants $b_{i,j}$ with $1 \leq i \leq t$ and $1 \leq j \leq m_i$.

Example 22.12. Consider the recurrence $a_n = 6a_{n-1} - 9a_{n-2}$ with $a_0 = 2, a_1 = 3$. This recurrence has a characteristic equation $x^2 - 6x + 9 = 0$, and therefore has a root 3 of multiplicity 2. This means that our solution will be of the form $a3^n + bn3^n$. As usual, we solve for $a$ and $b$ using the initial conditions, \begin{align*} a3^0 + b \cdot 0 \cdot 3^0 &= a = 2 \\ a3^1 + b \cdot 1 \cdot 3^1 &= 3a + 3b = 3 \end{align*} This gives us $a = 2$ and $b = -1$ and so the solution to the recurrence is $$a_n = 2 \cdot 3^n - n 3^n = (2-n)3^n.$$

There is one more curveball that we can throw that is not too hard to deal with. This is when we are confronted with a nonhomogenous recurrence. We've already seen an example of such a recurrence, the Tower of Hanoi recurrence $H_n = 2H_{n-1} + 1$.

Definition 22.13. A linear nonhomogenous recurrence relation with constant coefficients is a recurrence relation of the form $$a_n = c_1 a_{n-1} + c_2 a_{n-2} + \cdots c_k a_{n-k} + F(n),$$ where $c_1, \dots, c_k$ are real numbers and $F(n)$ is a nonzero function that depends only on $n$. The recurrence $$a_n = c_1 a_{n-1} + c_2 a_{n-2} + \cdots c_k a_{n-k}$$ is called the associated homogenous recurrence relation.

The very convenient thing about recurrences of this form is that part of the solution is formed by the associated homogenous recurrence, and so, we can make use of our work from before. The only complication is how to deal with the nonhomogenous term.

Theorem 22.14. Let $a_n = c_1 a_{n-1} + \cdots c_k a_{n-k} + F(n)$ be a recurrence relation, where $c_1, \dots, c_k$ are real numbers and $F(n)$ is a nonzero function. If $\{g_n\}$ is a particular solution to the recurrence $a_n$ and $\{h_n\}$ is a solution to the associated homogenous linear recurrence, then all solutions of $a_n$ are of the form $\{g_n + h_n\}$.

Proof. Consider two solutions to $a_n = c_1 a_{n-1} + \cdots + c_k a_{n-k} + F(n)$, say $f$ and $g$. That is, we have \begin{align*} f_n &= c_1 f_{n-1} + c_2 f_{n-2} + \cdots c_k f_{n-k} + F(n), \\ g_n &= c_1 g_{n-1} + c_2 g_{n-2} + \cdots c_k g_{n-k} + F(n). \end{align*} Then it is clear that $f-g = h$ is a solution to the associated homogenous linear recurrence of $a_n$ and we have $f = g + h$. $\tag*{$\Box$}$

Example 22.15. Consider the recurrence $a_n = 3a_{n-1} + 2n$ with $a_0 = 2$. We know how to solve the associated homogenous recurrence, $a_n = 3a_{n-1}$, easily; we have $h_n = a3^n$ for some constant $a \in \mathbb R$.

Now, we need to find a possible solution. For now, we will guess such a solution, observing that $F(n) = 2n$, and guessing that a linear function $g_n = bn+c$ may work. Substituting this into the recurrence gives us $$bn+c = 3(b(n-1) + c) + 2n.$$ We can then rearrange this to $(2b+2)n + 2c-3b = 0$ and solve for $b$ and $c$ by \begin{align*} 2b+2 &= 0 \\ 2c-3b &= 0 \end{align*} This gives us $b = -1$ and $c = -\frac 3 2$, and we have $g_n = -n - \frac 3 2$. Then by Theorem 22.14, all solutions of our recurrence are of the form $$a_n = g_n + h_n = -n -\frac 3 2 + a 3^n.$$ To complete this solution, we just need to consider our initial condition $a_0 = 2$. This gives us \begin{align*} a_0 &= a3^0 - 0 -\frac 3 2 \\ 2 &= a - \frac 3 2 \\ a &= \frac 7 2 \end{align*} Thus, our solution is $a_n = \frac 7 2 \cdot 3^n - n - \frac 3 2$.

Of course, we would like some certainty in our lives and guessing is a bit too shaky for most of us to feel comfortable with. Surely, there's a better way? It turns out that if $F(n)$ is of a particular form, then there is.

Theorem 22.16. Let $a_n = c_1 a_{n-1} + \cdots c_k a_{n-k} + F(n)$ be a recurrence relation, where $c_1, \dots, c_k$ are real numbers and $F(n)$ is a nonzero function. Suppose $F(n)$ is of the form $$F(n) = (b_t n^t + \cdots + b_1 n + b_0)s^n,$$ where $b_0, \dots, b_t$ are real numbers and $s$ is a constant.

If $s$ is not a characteristic root of the associated linear homogenous recurrence, then $a_n$ has a particular solution of the form $$(p_t n^t + \cdots + p_1 n + p_0) s^n.$$
If $s$ is a characteristic root of the associated linear homogenous recurrence and its multiplicity is $m$, then $a_n$ has a particular solution of the form $$n^m(p_t n^t + \cdots + p_1 n + p_0) s^n.$$

The distinction between the two cases at the end have to do with whether we can just tack on a term with $s^n$ to our solution. If the characteristic equation doesn't have a root $s$, then we can just add it. If $s$ is a root, then we have to multiply it by a term $n^m$ to ensure that it's separate, in the same way that we would have had to with multiples of the same root.

But what if $F(n)$ isn't of this form? Well, then we're out of luck, which is just as well, because we've reached the end of this course.

Q&A

Thanks to Jingjing Xiao for scribing this portion of the lecture.

What will be on the final?: The final will be similar to the midterm. If it appeared on the problem sets, then there is a good chance similar questions will appear on the final. The final is about 9-10 questions, each worth the same number of marks. The final will be three hours long. Questions can be a mix of topics, such as combinatorial questions about graphs, or similar to the probability and number theory problem you saw on a problem set. A good way to review is to go through problem sets and the optional exercises. The final will be similar in difficulty to the midterm. Most of the problems will be about proofs. For calculations/computations, make sure to detail how you arrived at what you computed and why, like a proof. We are interested in your reasoning as well as the calculation.
How should we show our train of thought for calculations?: See the solution for Problem Set 7, Question 4.
Are there combinatorial proofs in general on the final?: I don't know how to answer that.
When you read a problem, how do you decide if you should do a direct proof or a proof by contradiction?: Usually, I think about what seems easiest, what I know about those types of problems, and try to go from there. Think about whether it might be easier to find a counterexample or to show something directly.
Will questions about equivalence classes come up?: This is a great example of something that mixes different topics; we last saw equivalence relations in number theory and suddenly, they showed up again in graphs. You should understand definitions, because they may show up in ways that you don't expect.
Related to this week's problem set, what is an Eulerian tour and what is a Hamiltonian cycle?: A Hamiltonian cycle is a cycle that visits every vertex in the graph exactly once. An Eulerian tour is a closed walk that visits every edge in the graph exactly once. Note that an Eulerian tour may visit vertices more than once (which is why we can't call it a cycle).
Are we expected to know all the definitions in graph theory?: Yes.
If we're asked about a graph, are we going to be given the diagram?: Not necessarily; one thing I could do is to give you a definition of the graph and ask for a drawing of it.
Do we have to worry about multigraphs and things like that?: No, for this class we're only focusing on what Rosen calls "simple graphs"; i.e. undirected graphs without multiple edges or loops. On that note, you should review terminology related to graphs from the course notes, as we use different terminology from Rosen.
Should we bring colouring materials for colouring problems?: You don't have to. Typically, we're fine using numbers to represent colours.
Will we have to give an entire isomorphism for graphs?: Probably not. I'm not very interested in knowing if you can check every pair of vertices on the test. On that note, I'm not that interested in having you follow an algorithm or perform very complicated calculations either.
Will we be seeing any proofs about variance?: You should understand how to do them.
Can we get an example of combinatorial or probability questions about graphs?: Some things of that flavour (which are probably too hard to actually be on the final) include suppose you're given a graph, how many subgraphs of size $k$ are there? Or, how many paths are there in this graph, what is the probability that we choose a particular path. Or maybe given a set of $n$ vertices, how many trees are there on those vertices?
Will we have to use the adjacency matrix?: Probably not; you definitely won't be asked to compute anything using the adjacency matrix, since no one knows how to do matrix multiplication.
Is there anything else we don't need to know?: The master theorem, stuff about cryptography, the last portion of graph theory talking about matchings in general graphs and Berge's lemma.
How should we approach graph proofs?: For induction in general, like for a sum, we typically want to take a sum $\sum_{i=0}^{k+1} S(x)$ and pull out the last term so that it becomes $\sum_{i=0}^{k} S(x) + S(k+1)$, apply the inductive hypothesis to $\sum_{i=0}^{k} S(x)$ and then fold $S(k+1)$ back into the sum. The approach for graphs is similar: We have a graph of $k+1$ vertices, we remove it to get a subgraph with $k$ vertices, apply the inductive hypothesis, go back to our original graph and consider how to deal with the final piece. For other graph proofs, there are properties and characterization for various graphs that you should be familiar with, like the relationship between connectedness and paths.
Will there be questions about bridges?: You should be famiilar with proofs about graphs.
What does the grading scale look like?: The standard.
Can you go through Problem Set 8, Question 1b?: See solutions.
Do we have to do a proof about variance?: Maybe.
Will algorithms have the same class structure as this class?: It will be pretty similar unless classes get cancelled and some things might have to change.
Are you currently writing a paper?: Right now I'm revising one for a conference in May, which might not end up happening anyway, but I haven't done anything new recently.
Are you going to teach classes outside of MPCS?: I'm actually teaching formal languages in the College this quarter, and next quarter I'll also be teaching algorithms in the College.
Which of these topics are relevant to problems in tech interviews?: Generally, tech interviews like asking algorithmic questions. Depending on the job, you may get combinatorial, basic probability, or graph problems as well.
If we rite something dumb on the final, should we erase it?: That depends on how much time you have and whether you've come up with anything better. I had a student in another class once come up with an answer that they realized was wrong, scratched it out, and in a panic came up with something worse, and I ignored the scratched out part. Be clear about what you want us to read.
Will the grades be curved?: Not in the sense that I am looking for a particular distribution. Any "curving" that happens will be to your advantage.
Will there be a practice exam?: No, that's why there are exercises.
Will you still have office hours this Friday?: I'll be there, but you don't need to come only during office hours. Feel free to come by my office; I'm usually there until around 5 pm.