CMSC 27100 — Lecture 20

Let's prove the remaining two properties of trees that were introduced last class.

Suppose $G$ is a tree. Then $G$ is connected and has no cycles. Consider a pair of vertices $u, v$. Since $G$ is connected, there is a path between $u$ and $v$. Since $G$ contains no cycles, this path must be unique. Otherwise, if there were a second path, we can form a cycle.

Now suppose that every pair of of vertices $u$ and $v$ in $G$ has a unique path between them. Then $G$ is connected. Now, suppose for contradiction that $G$ contains a cycle and $u$ and $v$ are part of this cycle. The cycle exhibits two paths between $u$ and $v$, contradicting our assumption that there is a unique path between $u$ and $v$. Thus, $G$ does not contain any cycles. Therefore, $G$ is a tree.

Consider two non-adjacent vertices $u,v$ in $T$. Since $T$ is a tree, there is a path from $u$ to $v$ in $T$. Now let $T' = T \cup (u,v)$, the graph with the edge $(u,v)$ added. There are now two distinct paths between $u$ and $v$: the original path in $T$ and the path consisting of the edge $(u,v)$. Therefore, $T'$ contains a cycle. (In fact, since $(u,v)$ is disjoint from the path, the path together with $(u,v)$ forms the unique cycle.)

We can apply the idea of a tree as a minimally connected graph to say something about connectivity in graphs in general. One clear application is finding a minimal connected subgraph of the graph such that every vertex is connected. One can see how this might be useful in something like road network where you're trying to figure out what the most important roads to clear after a snowfall are.

Suppose $G$ has a spanning tree $T$. Then there exists a path between every pair of vertices in $T$ and therefore, there exists a path between every pair of vertices in $G$. Then by definition, $G$ is connected.

Now, suppose $G$ is connected. Consider the connected spanning subgraph $H$ of $G$ with the minimal number of edges. Suppose $H$ contains a cycle. Then we can remove an edge from this cycle, since no edge in a cycle is a bridge. Therefore, $H$ contains no cycles and since $H$ is connected, it is a tree.

An obvious computational problem is to find or construct spanning trees for a given graph. Many of you already know the algorithms that do this—it's breadth and depth first search. We are already familiar with BFS and DFS as algorithms that solve the connectivity problem: given two vertices $s$ and $t$, is there a path from $s$ to $t$?

This may not be something that you've considered before, particularly since we typically teach these as algorithms on directed graphs. Of course, the same algorithms work perfectly fine on undirected graphs. But now we can try to reason that the traversal of the graphs that these algorithms perform gives us a tree.

The following presentation is due to Jeff Erickson, from Section 5.5 of his book, Algorithms. First, let's see breadth-first search.

    \begin{algorithmic}
    \Procedure{breadth-first-search}{$G,s$}
        \State Add vertex $s$ to queue $Q$
        \While{$Q$ is not empty} 
            \State Dequeue vertex $v$ from queue $Q$
            \If{vertex $v$ is not marked}
                \State Mark vertex $v$
                \ForAll{edges $(v,w) \in E$}
                    \State Add vertex $w$ to queue $Q$
                \EndFor
            \EndIf
        \EndWhile
    \EndProcedure
    \end{algorithmic}

Recall that a queue is a data structure for collections with constant-time ($O(1)$) controlled access—when we put items in the queue, we must remove the first ones we put in (the oldest ones). Similarly a stack is a data structure with constant-time access that works the other way: we must remove the last items we put in first (the newest ones).

While this algorithm is written for breadth-first search, we can make it into depth-first search by swapping out the queue for a stack. Intuitively, when we add neighbours to a queue, the next items we take from the queue are going to be the oldest ones—i.e., the ones closest to the start vertex $s$. If we use a stack instead, we are always going to taking the newest vertices when removing them from the stack—these are the vertices furthest away from $s$.

We claim that by adding a bit of information to the algorithm, namely the parents and traversed edges, we can get ourselves a tree.

    \begin{algorithmic}
    \Procedure{bfs-tree}{$G,s$}
        \State Add edge $(\varnothing, s)$ to queue $Q$
        \While{$Q$ is not empty} 
            \State Dequeue edge $(u,v)$ from queue $Q$
            \If{$v$ is not marked}
                \State Mark vertex $v$
                \State Assign vertex $u$ as the parent of $v$ and add edge $(u,v)$ to $T$
                \ForAll{edges $(v,w) \in E$}
                    \State Add edge $(v,w)$ to queue $Q$
                \EndFor
            \EndIf
        \EndWhile
    \EndProcedure
    \end{algorithmic}

Then we get the following property of breadth-first search, essentially saying that breadth-first search is correct—it visits every reachable vertex.

Breadth first search marks every vertex reachable from $s$ and only those vertices.

We can prove this by induction on the length of the shortest path from $s$ to $v$.

The base case is when the length of the path is 0. In this case, the only reachable vertex is $s$, which is marked.

Now, we assume that for arbitrary $k$ that all vertices reachable on paths of length at most $k$ are marked. Consider a vertex $v$ which is reached on a path of length $k+1$. We can write such a path as $s v_1 \cdots v_{k-1} u v$. We notice that there is a path $s \cdots u$ which is of length at most $k$, so by our inductive hypothesis, BFS marks $u$.

When $u$ is marked by BFS, it adds $(u,v)$ to the queue. Since the queue is not empty, the edge $(u,v)$ must be dequeued later, at which point BFS will mark $v$ unless it was already marked.

We then want to prove that breadth-first traversal necessarily traverses edges in such a way that the edges will form a tree.

The set $T$ of all pairs $(u,v)$, where $u \neq \varnothing$ is the breadth-first search assigned parent of $v$, is the edge set for the spanning tree of the connected component of $G$ containing $s$.

We observe that every $(u,v)$ pair where $u$ is the parent of $v$ is an edge of $G$, with the exception of $(\varnothing, s)$. We will first prove that for any vertex $v$, a path from $v$ to $s$ can be constructed via the "parent" edges that we collect in $T$, by induction on the order in which the vertices are marked.

Our base case in this case is $s$, which is marked immediately and has no parent but is reachable from $s$. We consider for some $k$ that the $k$th vertex to be marked by BFS and all those before it have a path back to $s$ via the parent edges in $T$. So we consider a $(k+1)$th marked vertex $v$.

Suppose the parent of $v$ is $u$. We know that in order to mark $v$, $u$ must have been marked by BFS before $v$. So we can apply the inductive hypothesis to see that there is a path from $u$ to $s$ using edges from $T$. We can then use this path to construct a path from $v$ to $s$ by adding $(u,v)$, which is also an edge from $T$.

Since there is a path from every marked vertex $v$ to $s$, this means that all of the marked vertices and the parent edges form a connected subgraph of $G$. We also note that every vertex has a unique parent except for $s$. This means that there is one fewer parent edge than there are vertices. In other words, this connected subgraph has $n$ vertices and $n-1$ edges, so the subgraph is a tree.

These two facts: that the algorithm marks every reachable vertex and that it constructs a spanning subgraph that's a tree allows us to conclude that it constructs a spanning tree for the component that contains $s$.