MPCS 55001 — Lecture 5

Dynamic Programming

The algorithmic design paradigms we've seen already seem like natural approaches to problems. Greedy algorithms operate on the idea of applying a single rule to making choices. Divide and conquer algorithms operate on the idea of dividing a problem into subproblems recursively and combining the results. But both of these approaches have weaknesses. Greedy strategies are often too simple to solve optimally for many problems, while divide and conquer algorithms may not be able to give meaningful improvements for problems with large solution spaces.

Our next algorithm design approach will seek to address these weaknesses. Dynamic programming is an approach first defined by Bellman in the 1950s. The apocryphal story behind the name suggests that Bellman wanted to make it sound more exciting to trick his superiors at RAND into thinking that this wasn't math. The programming part of the name is more instructive: it refers to planning (as it would have been used in the '50s) rather than our modern interpretation of coding.

Interval scheduling, revisted

Recall the interval scheduling problem:

The interval scheduling problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, and each job $j_i$ has a start time $s(i)$ and a finish time $f(i)$, find the largest subset of mutually compatible jobs.

We will consider the weighted variant of this problem, where each job $j_i$ is also assigned a weight $w_i \gt 0$.

The weighted interval scheduling problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, where each job $j_i$ has a start time $s(i)$, a finish time $f(i)$ and a weight $w(i)$, find the maximum weight subset of mutually compatible jobs.

Of course, here weight can be any metric you like: cash monies, importance, etc. In other words, we don't consider every job to be equally valuable and there are jobs that we would really like to get done more than others.

Consider the following instance.

The earliest finish time algorithm will choose the jobs on the bottom row, each of weight 1, which causes the job above, with weight 1000, to be blocked, since it is incompatible.

In fact, no greedy approach will work for this problem, so we need to think about how to come up with a different approach. First, it helps to fix the order of our jobs, so that they are in order of finish time, $f(1) \leq \cdots \leq f(n)$.

Something that helps in these kinds of problems is to think backwards. For instance, suppose we have an optimal solution. Which jobs are in the solution? Is the last job, with finish time $f(n)$ in our solution?

We will need some definitions.

Let $p(j)$ be the largest index $i \lt j$ such that job $i$ is compatible with job $j$. Let $\mathcal O_j$ be the subset of mutually compatible jobs chosen from jobs $1$ through $j$ with maximum weight, denoted $OPT(j)$.

What we want to compute is $OPT(n)$ by considering the possible subsets of maximum weight jobs on fewer jobs. Let's return to our question about the last job. If $n$ is in $\mathcal O_n$, then this means that it can't contain any jobs that aren't compatible with $n$. However, if $n$ isn't in $\mathcal O_n$, then we would have to ask the same question of job $n-1$.

Let's generalize this. Consider the following two possibilities for $j$ and $\mathcal O_j$.

  1. $j \not \in \mathcal O_j$: This means that $\mathcal O_j$ will consist of the remaining jobs $1$ through $j-1$. This means that $\mathcal O_j$ is a possible solution for choosing a maximum weight set of jobs out of jobs $1$ through $j-1$. Therefore, $OPT(j) \leq OPT(j-1)$.
  2. $j \in \mathcal O_j$: Then we know that we can add $w(j)$ to $OPT(j)$. We also know that $\mathcal O_j$ will not contain jobs that are incompatible with $j$, which means that all jobs $p(j) +1, \dots, j-1$ will not be in $\mathcal O_j$. But again, our selection of jobs from $1$ through $p(j)$ constitutes a possible solution for choosing a maximum weight set of jobs out of jobs $1$ through $p(j)$. Together with the weight of job $j$, we have $OPT(j) - w(j) \leq OPT(p(j))$, which gives us $OPT(j) \leq w(j) + OPT(p(j))$.

These two observations give us the following recurrence: \[OPT(j) \leq \begin{cases} 0 & \text{if $j=0$,} \\ \max\{OPT(j-1), w(j) + OPT(p(j))\} & \text{if $j \gt 0$.} \end{cases} \]

This is not quite enough to get to what we want. So far, we've made observations about how our optimal solution $\mathcal O_j$ is constructed, which gives us the bounds that we've obtained so far. However, we still need to explicitly argue what many of you might have noticed already: that the solutions we're piecing together are made from the optimal solutions for the corresponding subproblems. Again, there are two cases.

  1. Consider the optimal solution for jobs $1$ through $j-1$, the set $\mathcal O_{j-1}$, which has cost $OPT(j-1)$. Since $\mathcal O_{j-1}$ is a subset of jobs from $1$ through $j-1$, it is also a subset of jobs from $1$ through $j$. But this means that $\mathcal O_{j-1}$ is a possible subset of jobs of maximum weight. Therefore, $OPT(j-1) \leq OPT(j)$.
  2. Consider the optimal solution for jobs $1$ through $p(j)$, the set $\mathcal O_{p(j)}$, which has cost $OPT(p(j))$. But since the final job is $p(j)$, I know I can safely add job $j$, to this set. So I can consider the set $\mathcal O_{p(j)} \cup \{j\}$, which now has weight $OPT(p(j)) + w(j)$. This set is a subset of jobs $1$ through $j$, so it is a possible subset of jobs of maximum weight. Therefore, $OPT(p(j)) + w(j) \leq O(j)$.

These observations now give us the following recurrence: \[OPT(j) \geq \begin{cases} 0 & \text{if $j=0$,} \\ \max\{OPT(j-1), w(j) + OPT(p(j))\} & \text{if $j \gt 0$.} \end{cases} \]

Together with the previous recurrence, we can finally conclude \[OPT(j) = \begin{cases} 0 & \text{if $j=0$,} \\ \max\{OPT(j-1), w(j) + OPT(p(j))\} & \text{if $j \gt 0$.} \end{cases} \]

Of course to prove this formally, we would need to add all the dressings that come with an inductive proof. Luckily, our argument above really constitutes most of the inductive case of such a proof—we just need to add our inductive hypothesis, that for all $k \lt j$, the set $\mathcal O_k$ is the maximum weight subset of jobs $1, \dots, k$ with weight $OPT(k)$.

As with all recurrences, our result is most straightforwardly implemented as a recursive algorithm. Here, we assume that there is some bigger program that has already sorted the jobs and computed $p(j)$ and has it readily available.

    \begin{algorithmic}
    \PROCEDURE{max-sched}{$j$}
        \IF{$j = 0$}
            \RETURN{$0$}
        \ELSE
            \RETURN{$\max(w(j) +$ \CALL{max-sched}{$p(j)$}, \CALL{max-sched}{$j-1$})}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

This is a pretty simple algorithm that you can prove the correctness for using induction. Great! Except this algorithm has a worst case running time of $O(2^n)$. Wow, terrible!

So let's take a look at what's going on here. Consider the following instance.

Now, let's consider the following tree of calls that the algorithm makes in computing $OPT(n)$.

Note that when calculating $OPT(6)$ in this instance, we need to make a call to $OPT(5)$ and $OPT(4)$, which in turn make their own recursive calls. The result is a very large tree, which I've cut off for space purposes. However, the result is that we get something close to a full tree. Since each node is a recursive call, there is only a constant amount of work being done. However, recall that there are exponentially many such nodes, which means that we get something approaching $O(2^n)$.

However, it seems like there's a lot of repeated work going on here. Maybe we can take advantage of this. There are two main ways to do this.

Memoization

The first is called memoization, which if you say out loud sounds like how you'd pronounce "uwu memorization". The idea behind memoization is that instead of recomputing each level of recursion, you simply store it once you've done it once, and future calls can look up the result rather than recompute it.

By giving access to a global array $M$ that stores the result of $OPT(j)$ in $M[j]$, we make an easy modification to our original algorithm to memoize it.

    \begin{algorithmic}
    \STATE Initialize an array $M$
    \PROCEDURE{max-sched}{$j$}
        \IF{$M[j] = \varnothing$}
            \STATE $M[j] \gets \max(w(j) +$ \CALL{max-sched}{$p(j)$}, \CALL{max-sched}{$j-1$})
        \ENDIF
        \RETURN{$M[j]$}
    \ENDPROCEDURE
    \end{algorithmic}

How much better is this algorithm?

Algorithm 6.9 has worst-case running time $O(n)$.

There are one of two possible behaviours for a call to optmemo(j). Either it immediately returns $M[j]$ because it was already initialized, or it initializes $M[j]$ and makes two recursive calls. Therefore, we need to determine how many recursive calls this algorithm can make.

We observe that the calls are made only if $M[j]$ is not initialized and after $M[j]$ is initialized. The number of uninitialized entries of $M$ is 0 at the beginning of the algorithm and increases by one whenever recursive calls are made. This means there can only be at most $2n$ recursive calls, which gives a running time of $O(n)$.

This is much better, but so far we've only computed the weight of the best solution, but not the solution itself! Luckily, we can make use of the values $M$ that we've computed in order to find the corresponding solution. We make use of the fact that $j \in \mathcal O_j$ if and only if $w(j) + OPT(p(j)) \geq OPT(j-1)$.

    \begin{algorithmic}
    \STATE $M$ is the array computed by \Call{max-sched}{}
    \PROCEDURE{find-sched}{$j$}
        \IF{$j = 0$}
            \RETURN{$\varnothing$}
        \ELSE
            \IF{$w(j) + M[p(j)] \geq M[j-1]$}
                \RETURN{$\{j\} \cup$ \CALL{find-sched}{$p(j)$}}
            \ELSE
                \RETURN{\CALL{find-sched}{$j-1$}}
            \ENDIF
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

We observe that most of the operations in this algorithm are basic operations, with the exception of the recursive calls. However, the recursive calls are bounded by $n$, since the calls are only made to values $i \lt j$. So Algorithm 15.8 has a worst-case running time of $O(n)$.

In total, the entire algorithm, including computing $OPT(n)$ and $\mathcal O_n$ runs in $O(n \log n)$ time, because of the requirement that our jobs are sorted in order of finish time.

Iterating over subproblems

Memoization was nice for augmenting the obvious but bad recursive algorithm that we came up with, but it is not really what we think of when we think of dynamic programming.

The key observation is that our initial view of the problem was top-down: we wanted to compute $OPT(n)$ and $OPT(n)$ was computed based on computing the subproblems $OPT(n-1)$ and $OPT(p(n))$. However, we can compute these subproblems from the other direction, bottom-up, starting from 0.

    \begin{algorithmic}
    \PROCEDURE{max-sched}{$j$}
        \STATE Initialize an array $M$
        \STATE $M[0] \gets 0$
        \FOR{$j$ from $1$ to $n$}
            \STATE $M[j] \gets \max(w(j) + M[p(j)], M[j-1])$
        \ENDFOR
    \ENDPROCEDURE
    \end{algorithmic}

We can prove this algorithm is correct by induction, similar to how we would for a recursive algorithm.

Algorithm 6.11 correctly computes $OPT(n)$.

We will prove that $M[n] = OPT(n)$ by induction on $n$. First, consider $n = 0$. Then $M[0] = 0 = OPT(0)$ by definition.

Now assume that $M[j] = OPT(j)$ for all $j \lt n$. By definition, we have \[OPT(n) = \max\{w(n) + OPT(p(n)), OPT(n-1)\}.\] By our inductive hypothesis, we have \[OPT(n) = \max\{w(n) + M[p(n)], M[n-1]\},\] and therefore $OPT(n) = M[n]$.

One thing to keep in mind when doing these proofs is to take care not to mix up $M[j]$ and $OPT(j)$. That is, the crux of the proof is to show that $M[j]$ is $OPT(j)$—we only substitute them once we make this assumption (via the inductive hypothesis). The proofs that we'll be doing later on will clearly denote the value of the solution we want by $OPT$, and involve showing that our program computes and stores this value as intended.

Algorithm 6.11 clearly runs in $O(n)$ time: it iterates through a loop $n$ times, with the work inside each iteration taking $O(1)$ time. The only other observation that is necessary is that each iteration only refers to computations that have already been performed in previous iterations.

The approach that we've used for this problem will be the same one that we'll deploy for other problems that we solve using dynamic programming. The most crucial step is to treat your problem as a recursive problem and to describe the solution to this problem recursively—i.e. as a recurrence.

In describing your problem as a recurrence, you identify the subproblems you need to solve. Once you have done this, it is a simple matter of storing the solutions these subproblems, which are the early values of your recurrence. Once you have done this, proving that your algorithm is correct is a matter of proving that the algorithm correctly stores the solutions to your recurrence.

The process is summarized as follows:

  1. Solve your problem recursively and identify the subproblems that are required to solve your problem.
  2. Define and prove a recurrence for your problem based on the above.
  3. Define an algorithm that iteratively solves your recurrence.
  4. Prove that your algorithm works by a straightforward induction showing that the values that are computed correspond to the solutions of the recurrence.

Knapsack

The knapsack problem was first formulated by Dantzig in 1957 and he gives a basic dynamic programming algorithm for it. The idea is that you have a knapsack that can hold a certain amount of weight. You would like to fill it with items such that the value of the items you choose are maximized. Of course, we don't really care about filling literal knapsacks full of stuff—this just happens to be a useful metaphor.

Dantzig is well known for the Simplex algorithm for solving linear programming problems. He is also the source of the urban legend of the student who was late to class, copied down some equations, did them for homework, and inadvertently solved two unsolved problems: he was the (grad) student.

The 0-1 knapsack problem is given a list of $n$ items, where item $i$ has value $v_i \gt 0$ and weight $w_i \gt 0$, and a knapsack weight limit of $W$, choose the subset of items with the maximum total value.

For this problem, we will assume that values and weights are positive integers.

Consider the following items.

$i$ 1 2 3 4 5
$w_i$ 1 3 4 7 11
$v_i$ 1 4 9 12 19

Suppose we can carry weight $12$. Then $\{1,5\}$ has weight 12 and value 20, while $\{1,3,4\}$ has weight 11 and value 22.

The question is, similar to our greedy algorithm, how should we consider how to choose items? We can look for greedy algorithms for a bit of inspiration: do we choose based on weight? On value? On weight and value?

We can think through similar choices. We want to maximize the value of a solution to the problem, but the question is which choices we should be making. Suppose that we only consider the optimal value of a solution by considering the items $1, \dots, i$. This seems like it should work, but there is a crucial piece missing: how do we ensure that we don't exceed our weight limit? Similarly, we can try to optimize around the value of a solution considering only the possible weight, but how do we formulate how this changes with respect to each item that's chosen?

It turns out the answer is to do both.

Let $OPT(i,w)$ denote the value of an optimal solution $\mathcal O_{i,w}$ for the knapsack problem on items $1, \dots, i$ subject to weight limit $w$.

In essence, what we do is to consider two different subproblems at the same time: the subproblem of which items are chosen out of $1, \dots, i$ and the subproblem of which items to choose with the given weight limit $w$. The subproblem of selecting a subset of items $1, \dots, i$ is clear, since it's similar to the scheduling problem we saw earlier.

But how do we consider subproblems on weight? Luckily, we made the helpful assumption that all weights are integers, so we have subproblems ranging on $0 \leq w \leq W$. This gives us the equation with two variables, \[OPT(i,w) = \max_{S \subseteq \{1,\dots,i\}} \sum_{j \in S} v_j \quad \text{subject to } \sum_{j \in S} w_j \leq w.\]

Under this formulation, we want to compute $OPT(n,W)$. So, now we need to figure out how to express $OPT(i,w)$ in terms of smaller subproblems. Let us consider what our choices are regarding $OPT(i,w)$ when considering whether $i \in \mathcal O_{i,w}$.

  1. If $i \not \in \mathcal O_{i,w}$, then $\mathcal O_{i,w}$ must contain only items chosen from $1, \dots, i-1$, and we have $OPT(i,w) = OPT(i-1,w)$.
  2. If $i \in \mathcal O_{i,w}$, then this means that $v_i$ is added to our solution and we must add the weight $w_i$, which decreases the allowed weight in any subproblem. So we have $OPT(i,w) = v_i + OPT(i-1,w-w_i)$.

We then take $OPT(i,w) = \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\}$.

This gives us the following recurrence, which we'll prove formally.

$OPT(i,w)$ satisfies the following recurrence. \[OPT(i,w) = \begin{cases} 0 & \text{if $i = 0$ or $w = 0$}, \\ \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\} & \text{for $i,w \gt 0$}. \end{cases} \]

We will prove this by induction on $i$ and $w$.

Our base case will be when $i = 0$ or $w = 0$. If $i = 0$, then there are no items to add, so $OPT(i,w) = 0$ for all $w$. If $w = 0$, then there is no space to add any items, since the weight of all items are strictly positive, so $OPT(i,w) = 0$ for all $i$.

Now, suppose $i, w \gt 0$. Assume for all $i' \lt i$ and $w' \lt w$ that $\mathcal O_{i',w'}$ is an optimal solution (i.e. with maximum value) with value $OPT(i',w')$. Let $\mathcal O(i,w)$ be an optimal solution on $i$ items and weight $w$ and value $OPT(i,w)$. Let $w_i$ be the weight of item $i$ and let $v_i$ be the value of item $i$.

First, we argue that \[OPT(i,w) \leq \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\}\] There are two cases, depending on whether or not $i$ is in $\mathcal O_{i,w}$.

Since we take the maximum over the two cases, the inequality is proved.

Next, we argue that \[OPT(i,w) \geq \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\}\] There are two cases, but this time, we consider the value of the subproblems.

Again, since we take the maximum over these two cases, the inequality is proved.

Together, both inequalities show that \[OPT(i,w) = \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\}\]

 

This recurrence leads to the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{knapsack}{$W,I = \{(w_i,v_i) \mid 1 \leq i \leq n\}$}
        \FOR{$w$ from $1$ to $W$}
            \STATE $V[0,w] \gets 0$
        \ENDFOR
        \FOR{$i$ from $1$ to $n$}
            \FOR{$w$ from $0$ to $W$}
                \IF{$w_i \gt w$}
                    \STATE $V[i,w] \gets V[i-1,w]$
                \ELSE
                    \STATE $V[i,w] \gets \max\{V[i-1,w],v_i + V[i-1,w-w_i]\}$
                \ENDIF
            \ENDFOR
        \ENDFOR
        \RETURN{$V[n,W]$}
    \ENDPROCEDURE
    \end{algorithmic}

Recall Example 6.14 from above. Running this algorithm constructs the following table $V$.

$i$/$w$ 0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 1 1 1 1 1 1 1 1 1
2 0 1 1 4 5 5 5 5 5 5 5 5 5
3 0 1 1 4 9 10 10 13 14 14 14 14 14
4 0 1 1 4 9 10 10 13 14 14 16 21 22
5 0 1 1 4 9 10 10 13 14 14 16 21 22

Our solution, $OPT(n,W)$, is highlighted in red.

knapsack computes $OPT(n,W)$.

We will show that $V[i,w] = OPT(i,w)$ for $i \leq n$ and $w \leq W$ by induction on $(i,w)$. First, we consider as our base cases $i = 0$ and $w = 0$. We have that $V[0,w] = 0 = OPT(0,w)$ for all $w \leq W$ and $V[i,0] = 0 = OPT(i,0)$ for all $i \leq n$.

Now consider $i,w \gt 0$ and assume that for all $(i',w')$ that are computed before $(i,w)$ that $V[i',w'] = OPT(i',w')$. First, suppose $w_i \gt w$. Then by line 7, we have $V[i,w] = V[i-1,w]$. By our inductive hypothesis, $V[i-1,w] = OPT(i-1,w)$, so $V[i,w] = OPT(i-1,w)$ and therefore $V[i,w] = OPT(i,w)$ as desired.

Then if $w_i \leq w$, by line 9, we have \[V[i,w] = \max(V[i-1,w], v_i + V[i-1,w-w_i])\] and by our inductive hypothesis, we have \[V[i,w] = \max\{OPT(i-1,w), v_i + OPT(i-1,w-w_i)\},\] so $V[i,w] = OPT(i,w)$.

knapsack has running time $O(nW)$.

The work is done in the nested loops. Each of the computations within the loops can be done in $O(1)$ time. The inner loop is run $W$ times, while the outer loop is run $n$ times. This gives $O(nW)$ time in total.

Now, at this point, some of you may remember my note on the solution to this problem that there is no known polynomial time algorithm for knapsack, but here, we have an algorithm that runs in $O(nW)$ time! What's up with that?

The key here is to remember that the complexity of algorithms are functions of the size of the input, so it's worth stepping back to ask what the size of this problem instance is. One part is obvious: the number of items that we have. The other part is not so obvious: what is the size of $W$?

A common mistake, as in this case, is to take $W$ as the size of $W$. But $W$ is the input, a number. The size of $W$ is the size of its representation, which is the number of bits of $W$ (or digits if you prefer to use some other number system). Either way, for any representation that isn't unary, the size of $W$ is $O(\log W)$.

The issue here then becomes clear: $W$ is not polynomial in $\log W$, so our $O(nW)$ bound is not polynomial in the size of the input. Algorithms that are polynomial in the numeric value of the input but not the size of the input are said to be pseudopolynomial.

Knapsack problems belong to a family of packing problems, where we are given a set of items with various properties and constraints and we are asked to choose a selection of items subject to constraints while tying to maximize or minimize some property:

It becomes pretty clear that many of these problems are quite important, in that they can be repurposed very easily as various computational and real life applications. However, they also happen to share the same unfortunate downside: we don't have efficient algorithms for many of these problems. This is where we begin to see glimpses of intractability rear its ugly head.