CMSC 27200 — Lecture 4

Single-Source Shortest Paths

Now, we'll take a look at a graph problem. Recall the definition of a graph.

Definition 4.1. A graph is a pair $G = (V,E)$ of vertices $V$ and edges $E$. A graph is undirected if edges are unordered pairs of vertices. A graph is directedif edges are ordered pairs of vertices.

Informally, edges in undirected graphs don't have a direction associated with them, while direction matters in the edges of directed graphs. Graphs are fundamental structures for representing all sorts of models and networks.

You may recall the notion of paths in a graph. We will be considering the following problem.

Problem 4.2. The shortest path problem is: Given a directed graph $G = (V,E)$ with an edge weight function $w:E \to \mathbb N$, a source vertex $s \in V$ and a destination vertex $t \in V$, find a shortest (least weight) directed path from $s$ to $t$.

Example 4.3. Consider the following graph. We want to get from vertex 1 to vertex 10.

Here, the shortest path from 1 to 10 is highlighted in yellow.

The applicability of finding shortest paths is almost entirely obvious:

degrees of separation in collaboration/social networks
network routing through a series of servers/nodes
speech recognition
currency arbitrage

One of the observations we should note is that it turns out finding the shortest path from $s$ to $t$ turns out to be about the same amount of work as finding the shortest path from $s$ to every vertex.

There are several algorithms for solving the shortest path problem. The one we'll look at, which uses a greedy strategy, is due to Edsger Dijkstra in 1959.

Dijkstra's algorithm

The strategy is to remember a set of vertices for which we already know the shortest path to from our source vertex $s$. At each step, we choose the vertex that has the minimal shortest path weight from the vertices we've already explored.

    \begin{algorithm}
    \caption{Dijkstra's algorithm}
    \begin{algorithmic}
    \PROCEDURE{Dijkstra}{$G,s$}
        \STATE $S = \{s\}$
        \STATE $d[s] = 0$
        \STATE $\pi[s] = \varnothing$
        \WHILE{$S \neq V$}
            \STATE Choose $v \not \in S$ such that $r(v) = \min_{u \in S} d[u] + w((u,v))$ is minimized
            \STATE $S = S \cup \{v\}$
            \STATE $d[v] = r(v)$
            \STATE $\pi[v] = u$
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

We are given a source vertex $s \in V$ as part of the input, along with the graph $G$. We then need to keep track of the following information:

an array $d$, where $d[v]$ is the length of a shortest path from $s$ to $u$,
an array $\pi$, where $\pi[v]$ is a predecessor for $v$ (i.e. if we went from $u$ to $v$ by following an edge $(u,v)$, $\pi(v) = u$),
a set $S$ of vertices we have visited and already know the shortest path from $s$ for.

We begin with $S = \{s\}$, $d[s] = 0$, and $\pi[s] = \varnothing$. We repeatedly then explore the graph by visiting vertices $v \not \in S$. We add to $S$ the vertex $v$ that minimizes \[r(v) = \min_{u \in S} d[u] + w((u,v)),\] and set $d[v] = r(v)$ and $\pi[v] = u$.

Once we've explored the entire graph, we will have information that allows us to construct the shortest path from $s$ to every other vertex in $G$. To recover a shortest path from $s$ to $v$, we simply follow the sequence of predecessors from $\pi[v]$ until we get back to $s$.

Example 4.5. Consider again our example from above.

Here, we've highlighted in blue the edges that were selected by Dijkstra's algorithm.

The question we have to deal with is whether this strategy actually works. In other words, how can we be sure that once we've added a vertex to $S$ that we can conclude we found the shortest path to it from $s$?

Proposition 4.6. For each $u \in S$, $d[u]$ is the length of a shortest path from $s$ to $u$.

Proof. We will show this by induction on $|S|$. First, we have $|S| = 1$, in which case $S = \{s\}$ and $d[s] = 0$.

Now suppose this holds for $|S| \geq 1$. Let $v$ be the next vertex added to $S$ and let $(u,v)$ be the final edge. Since $u \in S$, $d[u]$ is the length of a shortest path from $s$ to $u$, so there is a path of length $r(v) = d[u] + w((u,v))$ from $s$ to $v$. We want to show that this is a shortest path from $s$ to $v$.

Consider some other path $P$ from $s$ to $v$ and let $(x,y)$ be the first edge of $P$ that leaves our explored vertices $S$. We can consider the length of the path from $s$ to $y$. By our inductive hypothesis, this is $d[x] + w((x,y))$. By definition of $r(y)$, we have $r(y) \leq d[x] + w((x,y))$. But if $v$ was the vertex chosen by Dijkstra's algorithm, we have $r(v) \leq r(y)$. Therefore, $P$ is no shorter than $r(v)$. $$\tag*{$\Box$}$$

Since the algorithm adds vertices to $S$ at each step, this proposttion essentially proves the correctness of the algorithm.

There is one issue with this algorithm: what happens if we try to use negative edge weights? This question raises two points. The first is that the algorithm breaks when we have negative weight edges. The second is another question: why the heck would we want a graph with negative weight edges?

This is where the mention of currency arbitrage earlier comes in. Suppose we have a graph where vertices are currencies. Then we go from a currency $u$ to $v$ along an edge weighted by the conversion rate. Then we might want to find a shortest path from one currency to another, which may or may not be better than the direct conversion rate.

Now, as we've formulated the problem, we can't do this, since we multiply conversion rates, not add them. But we can take care of that pretty easily: just use the logarithm of the rates (recall that $\log(mn) = \log m + \log n$).

Okay, so then what? Well, we would like a shortest path from, say USD back to USD. Ordinarily, this should be 0 (i.e. we don't move), but if we have negative weight edges, this isn't always necessarily the case.

Consider a cycle $e_1 \cdots e_n$. If we have a cycle where we make money by trading all of those currencies, we should have something like $c_1 \cdot c_2 \cdots c_n \gt 1$ for rates $c_1, \dots, c_n$. Then what should our edge weights be? If we take the logarithm, we have \[\log c_1 + \cdots \log c_n \gt 0\] which is still not quite what we want, but if we take the negative logarithm, then we have \[(-\log c_1) + \cdots + (-\log c_n) \lt 0\] so we take $w(e_i) = - \log c_i$, find any negative weight cycles, and make tons of money, if we can find a shortest paths algorithm that's not Dijkstra's and can handle negative weights...

Efficiency

Unlike with our earlier algorithms, proving that this is efficient is a bit trickier.

First of all, how do we represent a graph? There are two main ways to do this (which you may have already covered in discrete math).

Definition 4.7. Let $G = (V,E)$ be a graph.

An adjacency matrix for $G$ is a $|V| \times |V|$ matrix $A$ where the $(i,j)$th entry, $A[i,j]$ is a 1 if $(i,j)$ is an edge and 0 otherwise.
An adjacency list for $G$ is an array $A$ of size $|V|$ where $A[i]$ is a list of vertices $j$ such that $(i,j)$ is an edge.

Both of these representations are robust, in that we can easily modify them for particular representations of graphs. For instance, if we have an undirected graph, then for the adjacency matrix, both $A[i,j]$ and $A[j,i]$ are 1, while for directed graphs, this depends on the direction. Similarly, for an adjacency list, we would have that $j$ is in $A[i]$ but $i$ not necessarily in $A[j]$ in a directed graph, while both would be true in an undirected graph.

The extension of these representations to include weights for edges is also quite simple. For an adjacency matrix, simply store the weight $w(i,j)$ instead of a 1 and a special $\infty$ symbol for edges that are not present. For adjacency lists, one can store the weight along with the corresponding edge.

Both of these representations have efficiency tradeoffs that are evident from the definitions.

Adjacency matrices require $\Omega(|V|^2)$ space, while checking the presence of an edge can be done in $O(1)$ time.
Adjacency lists require $O(|V|)$ time to check for an edge, but only need $O(|V|+|E|)$ space.

Now, going back to Dijkstra's algorithm, at a glance, it appears that there is some heavy computation going on in computing $r(v)$ for each $v \not \in S$. To do so, we would need to check each incoming edge to $v$, which involves something like $O(|E|)$ cost.

However, we can do something a bit more clever: precompute and keep track of $r(v)$. How do we do this? Consider when $r(v)$ would change: when a new vertex is added to $S$. In this case, we would only need to change $r(v)$ if we get a shorter path $s$ to $u$ to $v$. In other words, we take \[r(v) = \min \{r(v), r(u) + w((u,v))\}.\] This takes care of the computation of $r(v)$: for each $u$, we only need to do $O(n)$ work to update all the $r(v)$'s.

Now, we are still left with how to choose $v$ with the smalles $r(v)$. If we keep this information in an array, we can find the minimum in $O(n)$ time, which seems pretty good. But in fact, we can do better.

Fun with data structures

We will make use of a data structure (well, technically an abstract data type) called a priority queue.

Definition 4.8. A priority queue is an abstract data type that maintains a set $S$ of elements, where each element $v$ has an associated key $k$. A priority queue has the following operations:

insert(S,x): insert $x$ into $S$
minimum(S): retrieve the element with the smallest key
extract_min(S): retrieve the element with the smallest key and remove it from $S$
update(S,x,k): update the key for $x$ to $k$

These operations assume that we're prioritizing based on minimality; one can define a priority queue that takes the maximum key.

As a refresher, the difference between ADTs and data structures is that we don't necessarily know how ADTs are implemented. For instance, we could implement this just by keeping track of things in arrays. This turns out to be sub-optimal (a nice exercise is to figure out what the cost of each of the above operations would be in such an implementation).

For an efficient implementation, we will describe another data structure.

Definition 4.9. A (binary) heap is a binary tree which satisfies the heap property: For each element $i$ and $j$ the parent of $i$, we have $j \leq i$.

Again, this is a heap ordered based on minimality; there is a corresponding definition for heaps ordered based on maximality.

One thing to note is that the heap is similar to a binary search tree, in that it is a binary tree with some condition on how it stores elements imposed on it. This means that insertions and deletions from the heap need to maintain the heap property. The question is the cost of these operations.

Proposition 4.10. Let $H$ be a heap with $n$ elements. Then

Constructing a heap from $n$ elements can be done in $O(n)$ time.
Maintaining the heap property can be done in $O(\log n)$ time.
Inserting, deleting, and updating an element in the heap can be done in $O(\log n)$ time.

The details of how this is done can be found in CLRS Chapter 6 or KT 2.5.

Example 4.11. Here are two trees that hold the list $[2,3,5,8,13,21,34,56]$.

The tree on the left, in purple, satisfies the (min) heap property, while the tree on the right, in blue, does not.

Heaps were originally invented in 1963 by Williams, who used it to show an $O(n \log n)$ sorting algorithm based on heaps, called Heapsort. Once you work out the implementation, you pretty much get the sorting algorithm for free:

Throw your $n$ elements into a heap in $O(n)$ time.
Repeatedly extract the minimal element from the heap in $O(\log n)$ time and put it in an array in order.

Since you're taking $n$ elements out of heap each at a cost of $O(\log n)$, we get $O(n \log n)$.

The heap also almost gives us a priority queue implementation for free: we just maintain a heap based on the keys of the elements of our set.

Proposition 4.12. Let $S$ be a priority queue with $n$ elements. Then

insert(S,x) has running time $O(\log n)$,
minimum(S) has running time $O(1)$,
extract_min(S) has running time $O(\log n)$,
update(S,x,k) has running time $O(\log n)$.

Implementation

Now, we can revise our pseudocode for Dijkstra's algorithm to make explicit some of the implementation choices we made.

    \begin{algorithm}
    \caption{Dijkstra's algorithm with priority queue}
    \begin{algorithmic}
    \PROCEDURE{Dijkstra}{$G,s$}
        \STATE $d[s] = 0, \pi[s] = \varnothing$
        \FORALL{$v \neq s$}
            \STATE $d[v] = \infty, \pi[v] = \varnothing$
        \ENDFOR
        \STATE Create empty priority queue $P$
        \FORALL{$v \in V$}
            \STATE insert($P$,$(v,d[v])$)
        \ENDFOR
        \WHILE{$P$ is not empty}
            \STATE $u = $ extract\_min($P$)
            \FORALL{$(u,v) \in E$}
                \IF{$d[v] \gt d[u] + w((u,v))$}
                    \STATE update($P$,$v$,$d[v] + w((u,v))$)
                    \STATE $d[v] = d[u] + w((u,v))$
                    \STATE $\pi[v] = u$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}
    \end{algorithm}

Much of the additional logic is setting up all the data structures we need. The real question is what happens within the while loop.

Here, we use our priority queue $P$ to maintain the set of vertices that aren't explored yet. On each iteration, while there are still unexplored vertices, we take the vertex $v$ with minimal $d[v]$ out of the priority queue. We then update the keys (distances) for all the remaining unexplored vertices for the next iteration.

Proposition 4.14. Dijsktra's algorithm when implemented with a binary heap priority queue has time complexity $O(m \log n)$, where $m = |E|$ and $n = |V|$.

Proof. Everything before the while loop is setting up the data structures we need for the algorithm. This takes $O(n)$ time, based on our discussions about arrays and priority queues.

Now, consider the while loop. We perform one iteration of the loop for each vertex $V$. In each iteration, we remove a vertex from the priority queue, which has cost $O(\log n)$. We also examine a number of edges. It is important to note that since we only examine outgoing edges of $u$, this means that across all iterations of the while loop, we only examine each edge once.

Putting that together, we have $n$ deletions from $P$ and at most $m$ updates to the priority queue. This gives us a total of $O(n) + O(n \log n) + O(m \log n)$ time. Therefore, Dijkstra's algorithm has time complexity $O(m \log n)$. $$\tag*{$\Box$}$$

Notice that the complication with this analysis is in how we use the priority queue. However, we can do even better than this, by tinkering with the implementation of the priority queue itself. Recall that our implementation uses a binary heap to implement the priority queue, which has $O(\log n)$ time for all of its operations.

One can squeeze out additional efficiency by using a Fibonacci heap, which were invented by Fredman and Tarjan in 1984. These heaps have an amortized running time of $O(1)$ for update(S,x,k), which means that our analysis of Dijkstra's algorithm improves to $O(m + n \log n)$. This is good if you have dense graphs, with many more edges than vertices. You can read about Fibonacci heaps in CLRS Chapter 19.