MPCS 55001 — Lecture 2

The first algorithm design paradigm that we'll look at is the greedy algorithm. Greedy algorithms are called such because they are greedy with respect to some simple rule or property. In a sense, we've already seen some examples from the algorithms for graph search: in both cases there is a simple "greedy" rule that we follow for choosing which vertices to visit next. However, usually, we consider algorithms "greedy" in the context of some optimization problem where we're trying to minimize or maximize some quantity.

Interval scheduling

Suppose we have resource to which we need to schedule some activities. For instance,

Some of the activities in the list are in conflict, in that their times overlap. We want to be able to schedule these activities to complete the most we can. If two activities don't overlap, we say that they are compatible.

The interval scheduling problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, and each job $j_i$ has a start time $s(i)$ and a finish time $f(i)$, find the largest subset of mutually compatible jobs.

We need to choose as large a compatible subset of these jobs as possible. We list a few possibilities below.

What are some possible greedy strategies we can use to schedule the largest number of compatible jobs?

It turns out (2) is the strategy that is guaranteed to give us the maximum number of jobs, and none of the other strategies do. A nice exercise is to come up with instances where strategies 1, 3, and 4 fail.

It's worth thinking a bit about why this metric should work. In a sense, prioritizing earliest finish time allows us to guarantee when our resource is free and that it's available at the earliest point possible.

First, let $J = \{(s_i,f_i) \mid 1 \leq i \leq n\}$ be a list of start and finish time for jobs 1 through $n$. Then, we have the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{Earliest}{J}
        \STATE Sort jobs by finish times and renumber the jobs so that $f_1 \leq f_2 \leq \cdots \leq f_n$.
        \STATE Initialize $S = \emptyset$ to be the set of selected jobs
        \FOR{$j$ from 1 to $n$}
            \IF{job $j$ is compatible with $S$}
            \STATE add $j$ to $S$
            \ENDIF
        \ENDFOR
    \RETURN{$S$}
    \ENDPROCEDURE
    \end{algorithmic}

First, we can do a quick efficiency proof, since the algorithm is quite simple (as most greedy algorithms are).

The first step of sorting our list of jobs takes $O(n \log n)$ time. Then we have a loop where we look at each job and check whether it is compatible with our selected jobs $S$. At first glance, this sounds like we might need to check our current job $j$ against every job in $S$, but because we only add jobs in order of finish time, all we need to do is check against the last job $j'$ that was added to $S$. That means, we only need to remember this job and perform one comparison with it, which takes $O(1)$ time and space. This means that our algorithm is dominated by the sorting step, giving it a running time of $O(n \log n)$.

Note that we take advantage of the fact that sorting a list can be performed in $O(n \log n)$ time. If you haven't seen this result yet, we will be seeing how to do this later in the course.

Now let's show that this algorithm actually works. That is, we want to show that this algorithm is guaranteed to give us the optimal solution, and we can schedule the largest number of jobs.

Let $S = \{j_1, \dots, j_k\}$ be the set of jobs that we get from our greedy algorithm and let $S^* = \{j_1^*, \dots, j_\ell^*\}$ be an optimal solution. We want to show that $S = S^*$. First, we'll make the following observation.

We will prove this by induction. First, we have the base case, $i = 1$, which is obvious.

Now, for $i \gt 1$, we assume that $f(j_{i-1}) \leq f(j_{i-1}^*)$ and we want to show that $f(j_i) \leq f(j_i^*)$. Since $S^*$ is an optimal solution, we know that the jobs in $S^*$ are all compatible. Thus, we have $f(j_{i-1}^*) \leq s(j_i^*)$. Then by the inductive hypothesis, we have $f(j_{i-1}) \leq s(j_i^*)$.

This means that $j_i^*$ is a job that's compatible with jobs $j_1, \dots, j_{i-1}$ and could have been chosen by the greedy algorithm. Since the greedy algorithm chooses the job with the earliest finish time, we have $f(j_i) \leq f(j_i^*)$.

We will prove this by contradiction. Suppose the greedy strategy is not optimal and so $|S^*| = \ell \gt k = |S|$.

By Lemma 3.5, we have $f(j_k) \leq f(j_k^*)$. Now, consider $j_{k+1}^*$, which must exist, since there are more jobs in $S^*$ than $S$. Since the jobs in $S^*$ must be compatible, we have $s(j_{k+1}^*) \geq f(j_k^*) \geq f(j_k)$.

But this means that $j_{k+1}^*$ is a job that is available and compatible with $S$, so it should have been chosen by the greedy algorithm. This contradicts the fact that $S$ is the set of jobs obtained via our greedy algorithm.

This proof gives us one way to show that greedy algorithms are optimal, which is that "greedy stays ahead". We've essentially showed that for each step of the greedy algorithm, when we could've chosen a different job, we always end up getting the best result if we take the greedy option. In this case, the key result was Lemma 3.5.

Minimizing lateness

Because scheduling is such a general problem, we can come up with many different variants of the problem. Let's consider the following variant.

Suppose that instead of being given a start and finish time for jobs, we're simply given how long each job takes and a due date that we're trying to aim for. This would be like if you have four assignments due this week and you're really good at predicting how much time you'd need. Unfortunately, it may not be possible to meet all the due dates, but we should try our best to do so. What we would like is to schedule our jobs to minimize how late we are across all jobs.

The scheduling to minimize lateness problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, where,

schedule all jobs to minimize maximum lateness $L = \max_{1 \leq i \leq n} \ell(i)$.

Below, we have a list of jobs together with their deadlines and a possible arrangement of these jobs.

The lateness of jobs (which are late) are denoted in yellow. We take the largest of these to be the maximum lateness $L$ as defined in the problem.

The correct choice is strategy (2). Let $J = \{(t_n,d_n) \mid 1 \leq i \leq n\}$ be our list of jobs' processing times and deadlines. We have the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{Earliest}{J}
        \STATE Sort jobs by deadlines and renumber jobs so that $d_1 \leq d_2 \leq \cdots \leq d_n$.
        \STATE $t = 0$
        \FOR{$j$ from 1 to $n$}
            \STATE $s_j = t$
            \STATE $f_j = t+t_j$
            \STATE $t = t+t_j$
        \ENDFOR
    \RETURN{$\{(s_i,f_i) \mid 1 \leq i \leq n\}$}
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm is due to Jackson in 1955, which makes it one of the earliest examples of a greedy algorithm for an optimization problem. Now, you might be wondering what the heck they were trying to schedule back in 1955, and the answer is factory production lines.

Much like the earliest finish time algorithm, we can get a pretty simple running time analysis for the earliest deadline algorithm.

Our first step is sorting the deadlines, which takes $O(n \log n)$ time. Then we have a for loop which goes through each job and computes the start and finish times. These operations take $O(1)$ time, so the loop has running time $O(n)$ in total. So the algorithm is dominated by the sorting step, giving it $O(n \log n)$ running time.

One of the things we might notice about this algorithm is that there's no downtime between jobs. In fact, it's not too hard to convince yourself that you can take any optimal schedule that minimizes lateness and just get rid of any idle time in it and you would still have a schedule that minimizes lateness.

What this means is that we can just take every schedule and jam all the jobs together and to get a different schedule, we just start swapping jobs.

Consider our earliest-deadline schedule that is created by our algorithm. The jobs have been renumbered so that they are listed in order from 1 through $n$.

The idea of an inversion is more general than just schedules; we can apply it to permutations of lists. If we have a list $1, \dots, n$, then an inversion in a permutation $\pi$ is a pair $i,j$ with $i \lt j$ and $j$ appearing before $i$ in the permutation.

Here, we would like to consider inversions in schedules with respect to our earliest deadline schedule, which is the only idle-free schedule that contains no inversions.

Suppose an idle-free schedule has an inversion. Then it contains two adjacent inverted jobs.

Let $(i,j)$ be an inversion that is closest together and let $k$ be the element to the right of $j$. If $j \gt k$, then $(j,k)$ is an inversion that is closer. If $j \lt k$, then $(i,k)$ is an inversion that is closer together, since $i \lt j \lt k$.

Exchanging two adjacent inverted jobs $i$ and $j$ reduces the number of inversions by 1 and does not increase maximum lateness.

We denote $\ell(i)$ to be the lateness for job $i$ in the schedule with the inversion and $\ell'(i)$ the lateness in the schedule after swapping the inversion.

For every job $k \neq i,j$, the lateness stays the same and for job $i$, we have $\ell'(i) \leq \ell(i)$, since $i$ gets moved to before $j$.

Now, consider job $j$. If $j$ is late, then \begin{align} \ell'(j) &= f'(j) - d(j) \\ &= f(i) - d(j) \\ &\leq f(i) - d(i) \\ &\leq \ell(i) \end{align} In other words, once swapped, $j$ is no later than $i$ was before the swap. To get this calculation, we observe that once we swap $i$ and $j$, the finish time for $j$ after the swap, $f'(j)$, is the old finish time for $i$. Then to get the inequality, we recall that $i \lt j$ means that $d(i) \leq d(j)$. Ultimtely, maximum lateness does not increase.

Let $S^*$ be an optimal schedule with the fewest inversions. We assume $S^*$ has no downtime. If $S^*$ has no inversions, then $S^* = S$.

So suppose that $S^*$ has an inversion. Then by Lemma 3.12, there exists two adjacent inverted jobs, say $i$ and $j$. By Lemma 3.13, we can exchange these two jobs to reduce the number of inversions by 1 without increasing the maximum lateness.

But this new schedule is an optimal schedule with fewer inversions, contradicting our assumption that $S^*$ contained the fewest number of inversions.

This proof takes a slightly different argument than "greedy stays ahead". Here, we show that we can transform our solution into the one found by following the greedy strategy, by swapping pieces. This is called an exchange argument.

Single-Source Shortest Paths

Now, we'll take a look at a graph problem. When we considered graph search, we showed that BFS gave us the shortest path for each reachable vertex in terms of the number of edges. However, an easy extension we an make is to consider our graph with edge weights. We will be considering the following problem.

The shortest path problem is: Given a directed graph $G = (V,E)$ with an edge weight function $w:E \to \mathbb N$, a source vertex $s \in V$ and a destination vertex $t \in V$, find a shortest (least weight) directed path from $s$ to $t$.

One of the observations we should note is that it turns out finding the shortest path from $s$ to $t$ turns out to be about the same amount of work as finding the shortest path from $s$ to every vertex.

There are several algorithms for solving the shortest path problem. The one we'll look at, which uses a greedy strategy, is due to Edsger Dijkstra in 1959.

Dijkstra's algorithm

The strategy is to remember a set of vertices for which we already know the shortest path to from our source vertex $s$. At each step, we choose the vertex that has the minimal shortest path weight from the vertices we've already explored.

    \begin{algorithmic}
    \PROCEDURE{Dijkstra}{$G,s$}
        \STATE $S = \{s\}$
        \STATE $d[s] = 0$
        \STATE $\pi[s] = \varnothing$
        \WHILE{$S \neq V$}
            \STATE Choose $v \not \in S$ such that $r(v) = \min\limits_{u \in S} d[u] + w((u,v))$ is minimized
            \STATE $S = S \cup \{v\}$
            \STATE $d[v] = r(v)$
            \STATE $\pi[v] = u$
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

We are given a source vertex $s \in V$ as part of the input, along with the graph $G$. We then need to keep track of the following information:

We begin with $S = \{s\}$, $d[s] = 0$, and $\pi[s] = \varnothing$. We repeatedly then explore the graph by visiting vertices $v \not \in S$. We add to $S$ the vertex $v$ that minimizes \[r(v) = \min_{u \in S} d[u] + w((u,v)),\] and set $d[v] = r(v)$ and $\pi[v] = u$.

Once we've explored the entire graph, we will have information that allows us to construct the shortest path from $s$ to every other vertex in $G$. To recover a shortest path from $s$ to $v$, we simply follow the sequence of predecessors from $\pi[v]$ until we get back to $s$.

Here, we've highlighted in blue the edges that were selected by Dijkstra's algorithm.

The question we have to deal with is whether this strategy actually works. In other words, how can we be sure that once we've added a vertex to $S$ that we can conclude we found the shortest path to it from $s$?

We will show this by induction on $|S|$. First, we have $|S| = 1$, in which case $S = \{s\}$ and $d[s] = 0$.

Now suppose this holds for $|S| \geq 1$. Let $v$ be the next vertex added to $S$ and let $(u,v)$ be the final edge. Since $u \in S$, $d[u]$ is the length of a shortest path from $s$ to $u$, so there is a path of length $r(v) = d[u] + w((u,v))$ from $s$ to $v$. We want to show that this is a shortest path from $s$ to $v$.

Consider some other path $P$ from $s$ to $v$ and let $(x,y)$ be the first edge of $P$ that leaves our explored vertices $S$. We can consider the length of the path from $s$ to $y$. By our inductive hypothesis, this is $d[x] + w((x,y))$. By definition of $r(y)$, we have $r(y) \leq d[x] + w((x,y))$. But if $v$ was the vertex chosen by Dijkstra's algorithm, we have $r(v) \leq r(y)$. Therefore, $P$ is no shorter than $r(v)$.

Since the algorithm adds vertices to $S$ at each step, this proposition essentially proves the correctness of the algorithm.

Efficiency

Unlike with our earlier algorithms, proving that this is efficient is a bit trickier. At a glance, it appears that there is some heavy computation going on in computing $r(v)$ for each $v \not \in S$. To do so, we would need to check each incoming edge to $v$, which involves something like $O(|E|)$ cost.

However, we can do something a bit more clever: precompute and keep track of $r(v)$. How do we do this? Consider when $r(v)$ would change: when a new vertex is added to $S$. In this case, we would only need to change $r(v)$ if we get a shorter path $s$ to $u$ to $v$. In other words, we take \[r(v) = \min \{r(v), r(u) + w((u,v))\}.\] This takes care of the computation of $r(v)$: for each $u$, we only need to do $O(n)$ work to update all the $r(v)$'s.

Now, we are still left with how to choose $v$ with the smalles $r(v)$. If we keep this information in an array, we can find the minimum in $O(n)$ time, which seems pretty good. But in fact, we can do better.

Fun with data structures

We will make use of a data structure (well, technically an abstract data type) called a priority queue.

A priority queue is an abstract data type that maintains a set $S$ of elements, where each element $v$ has an associated key $k$. A priority queue has the following operations:

These operations assume that we're prioritizing based on minimality; one can define a priority queue that takes the maximum key.

As a refresher, the difference between ADTs and data structures is that we don't necessarily know how ADTs are implemented. For instance, we could implement this just by keeping track of things in arrays. This turns out to be sub-optimal (a nice exercise is to figure out what the cost of each of the above operations would be in such an implementation).

A (binary) heap is a binary tree which satisfies the heap property: For each element $i$ and $j$ the parent of $i$, we have $j \leq i$.

Again, this is a heap ordered based on minimality; there is a corresponding definition for heaps ordered based on maximality.

One thing to note is that the heap is similar to a binary search tree, in that it is a binary tree with some condition on how it stores elements imposed on it. This means that insertions and deletions from the heap need to maintain the heap property. The question is the cost of these operations.

When viewed as a tree, it's not totally clear how these running time bounds can be achieved. The idea is that the heap is actually implemented as an array. Suppose we have an array $A$. Then the top of the heap, the root node, is $A[1]$. Then for each node $A[i]$, the left child is $A[2i]$ and the right child is $A[2i+1]$. The details of how this works and the analysis can be found in CLRS Chapter 6 or KT 2.5.

The tree on the left, in purple, satisfies the (min) heap property, while the tree on the right, in blue, does not.

Heaps were originally invented in 1963 by Williams, who used it to show an $O(n \log n)$ sorting algorithm based on heaps, called Heapsort. Once you work out the implementation, you pretty much get the sorting algorithm for free:

Since you're taking $n$ elements out of heap each at a cost of $O(\log n)$, we get $O(n \log n)$.

The heap also almost gives us a priority queue implementation for free: we just maintain a heap based on the keys of the elements of our set.

Implementation

Now, we can revise our pseudocode for Dijkstra's algorithm to make explicit some of the implementation choices we made.

    \begin{algorithmic}
    \PROCEDURE{Dijkstra}{$G,s$}
        \STATE $d[s] = 0, \pi[s] = \varnothing$
        \FORALL{$v \neq s$}
            \STATE $d[v] = \infty, \pi[v] = \varnothing$
        \ENDFOR
        \STATE Create empty priority queue $P$
        \FORALL{$v \in V$}
            \STATE \CALL{insert}{$P,(v,d[v])$}
        \ENDFOR
        \WHILE{$P$ is not empty}
            \STATE $u = $ \CALL{extract-min}{$P$}
            \FORALL{$(u,v) \in E$}
                \IF{$d[v] \gt d[u] + w((u,v))$}
                    \STATE \CALL{update}{$P,v,d[u] + w((u,v))$}
                    \STATE $d[v] = d[u] + w((u,v))$
                    \STATE $\pi[v] = u$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

Much of the additional logic is setting up all the data structures we need. The real question is what happens within the while loop.

Here, we use our priority queue $P$ to maintain the set of vertices that aren't explored yet. On each iteration, while there are still unexplored vertices, we take the vertex $v$ with minimal $d[v]$ out of the priority queue. We then update the keys (distances) for all the remaining unexplored vertices for the next iteration.

Dijsktra's algorithm when implemented with a binary heap priority queue has time complexity $O(m \log n)$, where $m = |E|$ and $n = |V|$.

Everything before the while loop is setting up the data structures we need for the algorithm. This takes $O(n \log n)$ time, based on our discussions about arrays and priority queues.

Now, consider the while loop. We perform one iteration of the loop for each vertex $V$. In each iteration, we remove a vertex from the priority queue, which has cost $O(\log n)$. We also examine a number of edges. It is important to note that since we only examine outgoing edges of $u$, this means that across all iterations of the while loop, we only examine each edge once.

Putting that together, we have $n$ deletions from $P$ and at most $m$ updates to the priority queue. This gives us a total of $O(n) + O(n \log n) + O(m \log n)$ time. Therefore, Dijkstra's algorithm has time complexity $O(m \log n)$.

Notice that the complication with this analysis is in how we use the priority queue. However, we can do even better than this, by tinkering with the implementation of the priority queue itself. Recall that our implementation uses a binary heap to implement the priority queue, which has $O(\log n)$ time for all of its operations.

One can squeeze out additional efficiency by using a Fibonacci heap, which were invented by Fredman and Tarjan in 1984. These heaps have an amortized running time of $O(1)$ for update(S,x,k), which means that our analysis of Dijkstra's algorithm improves to $O(m + n \log n)$. This is good if you have dense graphs, with many more edges than vertices. You can read about Fibonacci heaps in CLRS Chapter 19.