CMSC 27200 — Lecture 8

Unlike with our earlier algorithms, proving that Dijkstra's algorithm is efficient is a bit trickier.

First of all, how do we represent a graph? There are two main ways to do this (which you may have already seen).

Both of these representations are robust, in that we can easily modify them for particular representations of graphs. For instance, if we have an undirected graph, then for the adjacency matrix, both $A[i,j]$ and $A[j,i]$ are 1, while for directed graphs, this depends on the direction. Similarly, for an adjacency list, we would have that $j$ is in $A[i]$ but $i$ not necessarily in $A[j]$ in a directed graph, while both would be true in an undirected graph.

The extension of these representations to include weights for edges is also quite simple. For an adjacency matrix, simply store the weight $w(i,j)$ instead of a 1 and a special $\infty$ symbol for edges that are not present. For adjacency lists, one can store the weight along with the corresponding edge.

Both of these representations have efficiency tradeoffs that are evident from the definitions.

Now, going back to Dijkstra's algorithm, at a glance, it appears that there is some heavy computation going on in computing $r(v)$ for each $v \not \in S$. To do so, we would need to check each incoming edge to $v$, which involves something like $O(|E|)$ cost.

However, we can do something a bit more clever: precompute and keep track of $r(v)$. How do we do this? Consider when $r(v)$ would change: when a new vertex is added to $S$. In this case, we would only need to change $r(v)$ if we get a shorter path $s$ to $u$ to $v$. In other words, we take \[r(v) = \min \{r(v), r(u) + w((u,v))\}.\] This takes care of the computation of $r(v)$: for each $u$, we only need to do $O(n)$ work to update all the $r(v)$'s.

Now, we are still left with how to choose $v$ with the smalles $r(v)$. If we keep this information in an array, we can find the minimum in $O(n)$ time, which seems pretty good. But in fact, we can do better.

Fun with data structures

We will make use of a data structure (well, technically an abstract data type) called a priority queue.

A priority queue is an abstract data type that maintains a set $S$ of elements, where each element $v$ has an associated key $k$. A priority queue has the following operations:

These operations assume that we're prioritizing based on minimality; one can define a priority queue that takes the maximum key.

As a refresher, the difference between ADTs and data structures is that we don't necessarily know how ADTs are implemented. For instance, we could implement this just by keeping track of things in arrays. This turns out to be sub-optimal (a nice exercise is to figure out what the cost of each of the above operations would be in such an implementation).

A (binary) heap is a binary tree which satisfies the heap property: For each element $i$ and $j$ the parent of $i$, we have $j \leq i$.

Again, this is a heap ordered based on minimality; there is a corresponding definition for heaps ordered based on maximality.

One thing to note is that the heap is similar to a binary search tree, in that it is a binary tree with some condition on how it stores elements imposed on it. This means that insertions and deletions from the heap need to maintain the heap property. The question is the cost of these operations.

The tree on the left, in purple, satisfies the (min) heap property, while the tree on the right, in blue, does not.

Heaps were originally invented in 1963 by Williams, who used it to show an $O(n \log n)$ sorting algorithm based on heaps, called Heapsort. Once you work out the implementation, you pretty much get the sorting algorithm for free:

Since you're taking $n$ elements out of heap each at a cost of $O(\log n)$, we get $O(n \log n)$.

The heap also almost gives us a priority queue implementation for free: we just maintain a heap based on the keys of the elements of our set.

Implementation

Now, we can revise our pseudocode for Dijkstra's algorithm to make explicit some of the implementation choices we made.

    \begin{algorithmic}
    \PROCEDURE{Dijkstra}{$G,s$}
        \STATE $d[s] = 0, \pi[s] = \varnothing$
        \FORALL{$v \neq s$}
            \STATE $d[v] = \infty, \pi[v] = \varnothing$
        \ENDFOR
        \STATE Create empty priority queue $P$
        \FORALL{$v \in V$}
            \STATE \CALL{insert}{$P,(v,d[v])$}
        \ENDFOR
        \WHILE{$P$ is not empty}
            \STATE $u = $ \CALL{extract-min}{$P$}
            \FORALL{$(u,v) \in E$}
                \IF{$d[v] \gt d[u] + w((u,v))$}
                    \STATE \CALL{update}{$P,v,d[u] + w((u,v))$}
                    \STATE $d[v] = d[u] + w((u,v))$
                    \STATE $\pi[v] = u$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

Much of the additional logic is setting up all the data structures we need. The real question is what happens within the while loop.

Here, we use our priority queue $P$ to maintain the set of vertices that aren't explored yet. On each iteration, while there are still unexplored vertices, we take the vertex $v$ with minimal $d[v]$ out of the priority queue. We then update the keys (distances) for all the remaining unexplored vertices for the next iteration.

Dijsktra's algorithm when implemented with a binary heap priority queue has time complexity $O(m \log n)$, where $m = |E|$ and $n = |V|$.

Everything before the while loop is setting up the data structures we need for the algorithm. This takes $O(n)$ time, based on our discussions about arrays and priority queues.

Now, consider the while loop. We perform one iteration of the loop for each vertex $V$. In each iteration, we remove a vertex from the priority queue, which has cost $O(\log n)$. We also examine a number of edges. It is important to note that since we only examine outgoing edges of $u$, this means that across all iterations of the while loop, we only examine each edge once.

Putting that together, we have $n$ deletions from $P$ and at most $m$ updates to the priority queue. This gives us a total of $O(n) + O(n \log n) + O(m \log n)$ time. Therefore, Dijkstra's algorithm has time complexity $O(m \log n)$.

There is a caveat with this. One may wonder whether this is really better than using an array to hold the $r(v)$ values. Finding the minimum in this case requires $O(n)$ time to scan the array, which is clearly worse than $O(\log n)$. However, we can update the array in $O(1)$ time, which is better than $O(\log n)$. In this alternative implementation, we require $O(n^2)$ time for finding the minimum $r(v)$ and $O(m)$ time for updating the $r(v)$'s. Then we have $O(n^2) + O(m)$, which is still $O(n^2)$ and we're good, since $m \leq n^2$.

Hold on: if $m \leq n^2$, how does $O(n^2)$ compare with $O(m \log n)$? Wouldn't that mean that we really have $O(n^2 \log n)$ time for our priority queue implementation? And that is a very good question to ask, because it turns out that the complexity changes depending on how many edges you have. If you have few edges and $m$ is not close to $n^2$ (i.e. $G$ is a sparse graph), then the priority queue implementation wins out. But if your graph contains many edges and $m$ is closer to $n^2$ (i.e. $G$ is dense graph), the array implementation may be a better choice. Tricky!

However, it turns out we can do even better than this, by tinkering with the implementation of the priority queue itself. Recall that our implementation uses a binary heap to implement the priority queue, which has $O(\log n)$ time for all of its operations. One can squeeze out additional efficiency by using a Fibonacci heap, which were invented by Fredman and Tarjan in 1984. These heaps have an amortized running time of $O(1)$ for update(S,x,k), which means that our analysis of Dijkstra's algorithm improves to $O(m + n \log n)$. This implementation wins out no matter how many edges your graph has. You can read about Fibonacci heaps in CLRS Chapter 19.

CMSC 27200 — Lecture 8

Implementing Dijkstra's algorithm

Fun with data structures

Implementation