CMSC 27200 — Lecture 5

The first algorithm design paradigm that we'll look at is the greedy algorithm. Greedy algorithms are called such because they are greedy with respect to some simple rule or property. To see this, let's start with a simple example.

Suppose you're being asked to make change. How do you find the least number of coins to give back? There is an obvious way that we deal with this: just pick the largest coin until we can't and go to the next largest. Eventually, we'll end up with the right amount. This is an example of a greedy algorithm, because we're optimizing around the rule "choose the largest coin denomination". An interesting question is: does this always give us the optimal outcome?

We can formalize this a bit more. Given a positive integer $n$, we want to find $a,b,c,d$ such that $n = 25a + 10b + 5c + d$ with $a+b+c+d$ minimized.

To see that this is true for American coins, we can argue about what we'd need to replace the largest coins with:

To prove this formally, we just need to wrap this up in an inductive argument and essentially argue that if I don't take the largest coin at each step, I end up with more coins than necessary.

We can play around with this for other countries' coin denominations. For example, another popular set of coin denominations is 100, 50, 20, 10, 5, 1. But does our greedy algorithm work for every set of coin denominations? The answer turns out to be no!

There are some other interesting questions that you can come up with for this kind of problem. For a given set of coin denominations, what is the average number of coins that you can expect to get from a transaction? And as a followup to this question, is there a new coin that we can introduce to a set of coin denominations to make the expected number of coins we get lower?

Interval scheduling

Suppose we have resource to which we need to schedule some activities. For instance,

Some of the activities in the list are in conflict, in that their times overlap. We want to be able to schedule these activities to complete the most we can. If two activities don't overlap, we say that they are compatible.

The interval scheduling problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, and each job $j_i$ has a start time $s(i)$ and a finish time $f(i)$, find the largest subset of mutually compatible jobs.

We need to choose as large a compatible subset of these jobs as possible. We list a few possibilities below.

What are some possible greedy strategies we can use to schedule the largest number of compatible jobs?

It turns out (2) is the strategy that is guaranteed to give us the maximum number of jobs, and none of the other strategies do. A nice exercise is to come up with instances where strategies 1, 3, and 4 fail.

First, let $J = \{(s_i,f_i) \mid 1 \leq i \leq n\}$ be a list of start and finish time for jobs 1 through $n$. Then, we have the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{earliest-finish}{J}
        \STATE Sort jobs by finish times and renumber the jobs so that $f_1 \leq f_2 \leq \cdots \leq f_n$.
        \STATE Initialize $S = \emptyset$ to be the set of selected jobs
        \FOR{$j$ from 1 to $n$}
            \IF{job $j$ is compatible with $S$}
            \STATE add $j$ to $S$
            \ENDIF
        \ENDFOR
    \RETURN{$S$}
    \ENDPROCEDURE
    \end{algorithmic}

First, we can do a quick efficiency proof, since the algorithm is quite simple (as most greedy algorithms are).

The first step of sorting our list of jobs takes $O(n \log n)$ time. Then we have a loop where we look at each job and check whether it is compatible with our selected jobs $S$. At first glance, this sounds like we might need to check our current job $j$ against every job in $S$, but because we only add jobs in order of finish time, all we need to do is check against the last job $j'$ that was added to $S$. That means, we only need to remember this job and perform one comparison with it, which takes $O(1)$ time and space. This means that our algorithm is dominated by the sorting step, giving it a running time of $O(n \log n)$.

Note that we take advantage of the fact that sorting a list can be performed in $O(n \log n)$ time. If you haven't seen this result yet, we will be seeing how to do this later in the course.

Now let's show that this algorithm actually works. That is, we want to show that this algorithm is guaranteed to give us the optimal solution, and we can schedule the largest number of jobs.

Let $S = \{j_1, \dots, j_k\}$ be the set of jobs that we get from our greedy algorithm and let $S^* = \{j_1^*, \dots, j_\ell^*\}$ be an optimal solution. We want to show that $S = S^*$. First, we'll make the following observation.

We will prove this by induction. First, we have the base case, $i = 1$, which is obvious.

Now, for $i \gt 1$, we assume that $f(j_{i-1}) \leq f(j_{i-1}^*)$ and we want to show that $f(j_i) \leq f(j_i^*)$. Since $S^*$ is an optimal solution, we know that the jobs in $S^*$ are all compatible. Thus, we have $f(j_{i-1}^*) \leq s(j_i^*)$. Then by the inductive hypothesis, we have $f(j_{i-1}) \leq s(j_i^*)$.

This means that $j_i^*$ is a job that's compatible with jobs $j_1, \dots, j_{i-1}$ and could have been chosen by the greedy algorithm. Since the greedy algorithm chooses the job with the earliest finish time, we have $f(j_i) \leq f(j_i^*)$.

We will prove this by contradiction. Suppose the greedy strategy is not optimal and so $|S^*| = \ell \gt k = |S|$.

By Lemma 5.6, we have $f(j_k) \leq f(j_k^*)$. Now, consider $j_{k+1}^*$, which must exist, since there are more jobs in $S^*$ than $S$. Since the jobs in $S^*$ must be compatible, we have $s(j_{k+1}^*) \geq f(j_k^*) \geq f(j_k)$.

But this means that $j_{k+1}^*$ is a job that is available and compatible with $S$, so it should have been chosen by the greedy algorithm. This contradicts the fact that $S$ is the set of jobs obtained via our greedy algorithm.

This proof gives us one way to show that greedy algorithms are optimal, which is that "greedy stays ahead". We've essentially showed that for each step of the greedy algorithm, when we could've chosen a different job, we always end up getting the best result if we take the greedy option. In this case, the key result was Lemma 5.6.