MPCS 50103 — Lecture 9

Walks, paths, cycles

A natural impulse when presented with a bunch of circles joined by lines is to see if we can get from one circle to another by following the lines. More formally, when we do this, what we're doing is checking if an object is connected to another object. It's not surprising to learn that this is a fundamental property of graphs that we'd like to work with.

Definition 18.1. A walk of length $k$ is a sequence of $k+1$ vertices $v_0, v_1, \cdots, v_k$ such that $v_{i-1}$ and $v_i$ are adjacent for $1 \leq i \leq k$. This walk has length $k$ because we traverse $k$ edges. A path is a walk without repeated vertices. A cycle or circuit of length $k$ is a path with $v_0 = v_k$.

Example 18.2. Consider the following graphs.

In the first, the walk $a_0 a_1 a_4 a_2 a_1 a_3$ is highlighted in red. In the second, the path $b_0 b_4 b_2 b_3 b_1$ is highlighted in orange. In the third, the cycle $c_0 c_4 c_2 c_1 c_0$ is highlighted in yellow.

Note that the above definitions are not universal and you should be very careful when consulting other materials (e.g. sometimes they use path when we mean walk).

Before we proceed, here is a simple theorem about walks and paths and how the fastest way between point $a$ and $b$ is a straight line.

Theorem 18.3. Let $G$ be a graph and $u,v \in V$ be vertices. If there is a walk from $u$ to $v$, then there is a path from $u$ to $v$.

Proof. Consider the walk $u = v_0 v_1 \cdots v_k = v$ from $u$ to $v$ of minimum length $k+1$. Suppose that a vertex is repeated. In other words, there exist, $i, j$ with $i \lt j$ such that $v_i = v_j$. Then we can create a shorter walk $$u = v_0 v_1 \cdots v_{i-1} v_i v_{j+1} v_{j+2} \cdots v_k = v,$$ since $(v_i,v_{j+1}) = (v_j,v_{j+1})$. Then this contradicts the minimality of the length of our original walk. So there are no repeated vertices in our walk and therefore, it is a path by definition. $\tag*{$\Box$}$

This leads to a basic question: how do I know whether there's a walk from $u$ to $v$? Although it is probably overkill, especially if you're only interested in one particular pair of vertices, one way we can answer this is by making clever use of the adjacency matrix.

Theorem 18.4. Let $G = (V,E)$ be a graph with $V = \{1,\dots,n\}$ and adjacency matrix $A$. The number of walks of length $\ell$ from $i$ to $j$ is the $(i,j)$th entry of $A^\ell$.

Proof. We will prove this by induction on $\ell$.

Base case. We have $\ell = 1$. Then $A^1 = A$ is just the adjacency matrix and the $(i,j)$th entry corresponds to the edge $(i,j)$, which is a walk of length 1 from $i$ to $j$.

Inductive case. Now assume for $k \in \mathbb N$ that the $(i,j)$th entry of $A^k$ is the number of walks of length $k$ from $i$ to $j$. Consider $\ell = k+1$.

Since $A^{k+1} = A^kA$, the $(i,j)$th entry $c_{i,j}$ of $A^{k+1}$ is $$c_{i,j} = b_{i,1} a_{1,j} + b_{i,2} a_{2,j} + \cdots + b_{i,n} a_{n,j},$$ where $b_{i,m}$ is the $(i,m)$th entry of $A^k$ and $a_{m,j}$ is the $(m,j)$th entry of $A$. By the inductive hypothesis, $b_{i,m}$ is the number of walks of length $\ell$ from $i$ to $m$.

Then we would like to know the number of walks from $m$ to $j$. This will be 1 if $(m,j)$ is an edge and 0 if not. So the number of walks is dependent on whether or not $j$ is reachable from each vertex $m$. If $j$ is reachable from $m$, then we can add the number of walks that reach $m$ to the number of those that reach $j$. Otherwise, $j$ is not reachable from $m$ and we remove those from our total. $\tag*{$\Box$}$

Note that although it doesn't help in the proof above, the matrix $A^k$ counting $k$-length walks is also well-defined for $k = 0$, giving us $A^0 = I$, the identity matrix. This supposedly gives us all of the walks of length 0 from a vertex $i$ to $j$. Since the identity matrix has zeroes everywhere except for the $(i,i)$th entry, we can interpret this as the number of walks starting at vertex $i$ and travelling along 0 edges, which brings us to the vertex $i$ itself.

Example 18.5. Consider the following graph.

If we take the vertices of this graph in order $a_0, a_1, a_2, a_3, a_4$, then we have the adjacency matrix $A$, defined by $$A = \begin{pmatrix} 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{pmatrix}$$ Then we have $$A^2 = \begin{pmatrix} 2 & 1 & 1 & 1 & 2 \\ 1 & 4 & 2 & 0 & 1 \\ 1 & 2 & 3 & 1 & 1 \\ 1 & 0 & 1 & 1 & 1 \\ 2 & 1 & 1 & 1 & 2 \end{pmatrix}, A^3 = \begin{pmatrix} 2 & 6 & 5 & 1 & 2 \\ 6 & 4 & 6 & 4 & 6 \\ 5 & 6 & 4 & 2 & 5 \\ 1 & 4 & 2 & 0 & 1 \\ 2 & 6 & 5 & 1 & 2 \end{pmatrix}, A^4 = \begin{pmatrix} 11 & 10 & 10 & 6 & 11\\ 10 & 22 & 16 & 4 & 10\\ 10 & 16 & 16 & 6 & 10\\ 6 & 4 & 6 & 4 & 6\\ 11 & 10 & 10 & 6 & 11 \end{pmatrix}. $$

Now, this gives us some nice information, but it's not really the best way to compute paths between two vertices, or even all paths between every pair of vertices. If you think about the cost of mutiplying matrices together, just going wild is probably not the best approach. What are some better ways of doing this? This is something that you can look forward to thinking about in an algorithms class.

Eulerian and Hamiltonian paths

We can play around with the notions of paths and cycles a bit more. One of the first questions of this kind was asked by Euler concerning the bridges of Königsberg. The city was connected by a bunch of bridges over the rivers that passed through it. The question was: is there a way to walk through the city so that you only go across every bridge exactly once?

If we reframe this question, we can see that it's a graph question, where the bridges act as edges. Of course, this is a multigraph which is not what we're interested in, but we can ask the same question of the simple graph version of this graph.

Definition 18.6. Let $G$ be a graph. An Eulerian tour is a closed walk that contains every edge of $G$. An Eulerian path is a path that contains every edge of $G$.

The following, due to Euler, gives us necessary and sufficient conditions for an Eulerian tour to exist in our graph.

Theorem 18.7 (Euler). A graph $G = (V,E)$ contains an Eulerian tour if and only if every vertex has even degree.

Proof. First, suppose $G$ contains an Eulerian cycle. Consider a walk going through a vertex. The walk must traverse two edges incident to this vertex: one going in, and one going out. So for any vertex in the interior of our walk, we need to have traversed an even number of edges, and therefore the vertex will have even degree. This leaves the starting vertex. However, our walk has to end at the starting vertex, so that gives us our even degree.

Now, for the other direction, we assume that every vertex of $G$ has even degree and choose a walk of maximal length. This walk has to be a cycle. If not, the first and last vertices of the walk have an odd number of edges that were traversed, so there must be an unused edge incident that we can add to our walk. Suppose for contradiction that our walk doesn't contain every edge in $G$. Then we can choose a vertex where we have an unused edge incident to it. Now, we construct a new walk that doesn't contain any edges of our original walk and this must also be a cycle for the same reason as above. We can then join the two walks together to form a larger walk, which contradicts our assumption that our walk was maximal. $\tag*{$\Box$}$

Example 18.8. Euler's result holds for multigraphs, since the original Königsberg bridges problem was formulated for a multigraph.

However, 1736 was a long time ago and Königsberg has changed since. For one thing, it's now Kaliningrad, which is the part of the Russian Federation that's not connected to the rest of it. The bridges have changed too: there are only five bridges connecting the different parts of the city. Luckily for us, this is no longer a multigraph.

However, examining the graph, we see that the modern arrangement of the bridges of Kaliningrad still doesn't have the Euler condition, and so we can conclude that there still is no Eulerian tour through the city's bridges.

Example 18.9. One question we can ask is how to find an Eulerian tour if we know our graph has one. The proof above actually gives us one! The algorithm is due to Carl Hierholzer in 1873. The algorithm on a graph $G = (V,E)$ is as follows:

Choose a vertex $v \in V$.
Repeat until we return to $v$: Choose an adjacent vertex $u$ to travel. If there is more than one incident edge, choose one that's not in the tour and remember that this vertex has unused edges
If there are vertices with unused edges, choose one and set $v$ to be this vertex. Then repeat Step 2.
The algorithm is finished once there are no unused edges (i.e. every edge is part of the tour).

There are two crucial properties that we rely on from Euler's proof above. The first is that we can't get stuck, since every vertex has even degree. The second is that we can make a larger tour by combining two cycles and we can do this until we exhaust every edge.

A quick analysis tells us that this algorithm takes about $O(|E|)$ time, or linear in the number of edges: we ever traverse any edge only once.

We can ask the same kind of questions but for vertices instead of edges. This question is due to William Rowan Hamilton. Hamilton was an Irish mathematician who invented the icosian game, which involves trying to find what we'll describe as a Hamiltonian cycle along the edges of a dodecahedron.

Definition 18.10. Let $G$ be a graph. A cycle is a Hamiltonian cycle if every vertex of $G$ is in the cycle. A path is a Hamiltonian path if every vertex of $G$ is in the path.

Example 18.11. The usual binary encoding of a number interprets each digit as a power of 2. Suppose we're interested in counting something and so we would proceed by counting using three bits in the following way 000, 001, 010, 011, 100, 101, 110, 111. Generally, this is not a problem because we think of these abstractly. However, it's worth thinking about the physical implementation of such a system.

Here, the big question is when we go from something like 011 to 100: how do we guarantee that the system perfectly switches every bit at the same time? The answer is that we really can't. For physical systems, you can imagine that this is not feasible, but even for electronic systems, this is not always possible. This is also a big problem if the switching of bits is expensive, as in quantum systems.

To deal with this, we need an encoding of numbers based on the idea that only one bit is changed at a time. These are called Gray codes, after Frank Gray who defined them in the late 1940s.

Gray codes have a lot of interesting mathematical properties, but we'll focus on graph theory. We can ask the question of whether we can construct a Gray code for all $n$. It seems obvious that we can get one for $n = 2$ and with some work, we can get $n = 3$. Is there a better way to do this than just by trial and error?

Recall that $Q_n$ is the graph of $n$-bit strings which are joined by edges if they differ in exactly one position. This sounds suspiciously like a Gray code. Then our question becomes: is there a path through the $Q_n$ that hits every vertex exactly once? If there is, this gives us our Gray code.

Theorem 18.12. For every integer $n \geq 2$, the $n$-cube $Q_n$ has a Hamiltonian cycle.

Proof. We will prove this by induction on $n$.

Base case. We have $n = 2$. Then our graph looks like this.

There is clearly a Hamiltonian cycle in this graph.

Inductive case. Let $k \geq 2$ and assume that $Q_n$ contains a Hamiltonian cycle. Now, consider the graph $Q_{n+1}$.

We partition our vertex set into two sets, with one set consisting of strings that begin with 0 and the other with strings that begin with 1. We have \begin{matrix} V_0 = \{0x \mid x \in \{0,1\}^k\}, & V_1 = \{1x \mid x \in \{0,1\}^k\}. \end{matrix} We observe that the subgraph induced by both of these sets is isomorphic to the $k$-cube, $Q_k$, since their vertex sets are really just binary strings of length $k$. Then by our inductive hypothesis, both of these subgraphs have a Hamiltonian cycle. Let $C = v_1 v_2 \cdots v_{2^k}$ be a Hamiltonian cycle for $Q_k$. In other words, $v_1, \dots, v_{2^k}$ are strings of length $k$. Then we have the Hamiltonian cycles $C_0 = 0v_1, 0v_2, \dots, 0v_{2^k}, 0v_1$ in the subgraph induced by $V_0$ and $C_1 = 1v_1, 1v_2, \dots, 1v_{2^k}, 1v_1$ in the subgraph induced by $V_1$.

Then a Hamiltonian cycle for $Q_{k+1}$ is $$0v_1, 0v_2, \dots, 0v_{2^k}, 1v_{2^k}, 1v_{2^k-1}, \dots, 1v_2, 1v_1, 0v_1.$$ In other words, we remove the final edges $C_0$ and $C_1$ so that they are paths. Then we add the edge $(0v_{2^k}, 1v_{2^k})$ to the cycle. This edge exists, because these two strings differ only in the first position. We then add the edge $(1v_1, 0v_1)$ and again these two strings differ only in the first position. Then our cycle is traversed by following $C_0$, then entering $C_1$ and traversing $C_1$ in reverse. $\tag*{$\Box$}$

This proof actually shows us a way to construct an $n$-bit Gray code. First, we take an $n-1$-bit Gray code. We append a 0 to the front of the $n-1$-bit Gray code and append a 1 to the front of the $n-1$-bit Gray code. Then we traverse the $n-1$-bit Gray code with the 0 at the beginning in its original order followed by the $n-1$-bit Gray code with the 1 at the beginning in reverse. It's because of this that the Gray code is also called a reflected code.

Just like with Eulerian cycles and paths, we can ask what conditions are necessary for a graph to contain a Hamiltonian path or cycle. Unlike Eulerian cycles and paths, we know of no necessary conditions for Hamiltonian cycles or paths, only sufficient ones. The following results are known, but we will not prove them.

Theorem 18.13 (Dirac). Let $G = (V,E)$. If $\deg(v) \geq \frac{|V|}2$ for all $v \in V$, then there exists a Hamiltonian cycle.

Theorem 18.14 (Ore). Let $G = (V,E)$ with $|V| \geq 3$. If for all pairs of nonadjacent vertices $u,v \in V$, $\deg(u) + \deg(v) \geq |V|$, then $G$ has a Hamiltonian cycle.

Basically, these two results say that as long as you have enough edges, there will be a Hamiltonian path. But, as mentioned, if a graph does have a Hamiltonian cycle, it doesn't mean that it will satisfy the conditions in these theorems.

This gives us hints that the difficulty of finding Eulerian and Hamiltonian cycles/paths might also be different. And this is true: we already saw that there is a (very) efficient algorithm for finding Eulerian tours. However, there isn't any known efficient algorithm for Hamiltonian cycles/paths; if there were, we could solve the Traveling Salesman Problem efficiently too. This might be a bit surprising, but it serves as a preview of an important point of theoretical computer science: very similar-looking problems can be quite different in terms of their difficulty.

Connectedness

What does it mean if there's a walk between two vertices? Practically speaking, it means that we can reach one from the other. This idea leads to the following definition.

Definition 19.1. A graph $G$ is connected if there is a walk between every pair of vertices.

The graphs that we've seen so far have mostly been connected. However, we're not guaranteed to work with connected graphs, especially in real-world applications. In fact, we may want to test the connectedness of a graph. For that, we'll need to talk about the parts of a graph that are connected.

Definition 19.2. A connected component of a graph $G = (V,E)$ is a connected subgraph $G'$ of $G$ that is not a proper subgraph of another connected subgraph of $G$.

Example 19.3. Consider the following graph.

This graph has two connected components, the subgraphs induced by $a_0,\dots, a_5$ and $a_6,\dots,a_9$. No other can be considered a connected component since they would either be not connected or a proper subgraph of one of the two connected components we identified.

Now, we can make use of our earlier computation with the adjacency matrix to determine if a graph is connected, by determining whether there is at least one walk between every pair of vertices.

Theorem 19.4. Let $G = (V,E)$ be a graph with adjacency matrix $A$ and let $B$ be the matrix $$B = I + A + A^2 + \cdots + A^{|V|-1} = \sum_{i=0}^{|V|-1} A^i.$$ Then $G$ is connected if and only if every entry of $B$ is positive.

Proof. By Theorem 18.4, the $(i,j)$th entry of the matrix $A^k$ is the number of $k$-length walks from $i$ to $j$ in $G$. Then adding the sum of the matrices $I,A,A^2,\dots,A^{|V|-1}$ is a matrix where the $(i,j)$th entry is the total number of walks from $i$ to $j$ of length at most $|V|-1$.

If $G$ is connected, then there must be a walk between every pair of vertices. This means that every entry of $B$ must be positive. If every entry of $B$ is positive, then this means there is a walk between every pair of vertices and therefore, $G$ is connected.

It remains to show that we only need to check walks of length up to $|V|-1$. Suppose there exists a walk of length greater than $|V|-1$ between vertices $i$ and $j$. By Theorem 19.3, there exists a path between these two vertices. Any path in $G$ contains at most $|V|$ vertices and therefore, it must be of length at most $|V|-1$. Therefore, it suffices to show that a path of length at most $|V|-1$ exists to show that any two vertices $i$ and $j$ are connected. $\tag*{$\Box$}$

Another question related to connectivity that we can ask is how fragile the graph is. For instance, if we imagine some sort of network (computer, transportation, social, etc.), this is the same as asking where the points of failure are in our network. Which vertices or edges do we need to take out to disconnect our graph?

Definition 19.5. A set $X \subseteq V$ is a vertex cut if the subgraph induced by $V \setminus X$ is disconnected. If $G$ is not complete, we define $\kappa(G)$ to be the size of the smallest vertex cut in $G$. If $G$ is complete, we define $\kappa(G) = n-1$.

A set $X \subseteq E$ is an edge cut if the subgraph $(V, E \setminus X)$ is not connected. We define $\lambda(G)$ to be the size of the smallest edge cut of $G$. If $G$ contains a single vertex with no edges, we define $\lambda(G) = 0$.

It is important to note that $G$ is not connected if and only if $\kappa(G) = \lambda(G) = 0$.

Example 19.6. Consider the following graph $G$.

This graph has $\kappa(G) = 1$, due to the vertex cut $\{c_2\}$ and $\lambda(G) = 1$ due to the edge cut $\{(a_3,c_2)\}$. Of course, these are only the minimal cut sets. We can choose a larger cut set that includes these vertices and edges if we wish.

We can relate the connectivity of a graph with the minimum degree of the graph.

Theorem 19.7. For all graphs $G = (V,E)$, $\kappa(G) \leq \lambda(G) \leq \min_{v \in V} \deg(v)$.

We won't prove this result. Two inequalities are easy to see: Take an arbitrary vertex $v$ of $G$ and delete all of its neighbours or all of its incident edges, and you've disconnected the graph! This gets us $\kappa(G), \lambda(G) \leq \deg(v)$. The hard part is proving $\kappa(G) \leq \lambda(G)$: one has to observe that deleting a vertex means deleting some edges, but the details are quite tricky.

Now let's consider the case where eliminating a single vertex or edge will disconnect the graph. Such vertices and edges also have special names.

Definition 19.8. Let $G = (V,E)$ be a graph and consider a vertex $v \in V$. If removing $v$ and its incident edges increases the number of connected components, then $v$ is a cut vertex or articulation point. Similarly, consider an edge $e \in E$. If removing $e$ increases the number of connected components, then $e$ is a cut edge or bridge.

In Example 19.6, each vertex of our minimal vertex cut is a cut vertex and every edge in our minimal edge cut is a bridge. We will primarily be concerned with bridges, for reasons we'll see shortly. First, let's prove an obvious lemma.

Lemma 19.9. Let $G$ be a connected graph. If $(u,v) \in E$ is a bridge, then $G - (u,v)$ (the subgraph of $G$ with only the edge $(u,v)$ deleted) has exactly two components and $u$ and $v$ are in different components.

Proof. Since $(u,v)$ is a bridge, we know that $G - (u,v)$ has at least two components. Let $X$ be the set of vertices that in the same component as $u$. Choose some vertex $x \not \in X$. Then $x$ is not in the same component as $u$. Since $G$ was connected, $G$ contained a path between $u$ and $x$ and there are no paths in $G - (u,v)$. Thus, any path from $u$ to $x$ must have contained the edge $(u,v)$. These paths must have the form $uv v_2 v_4 \cdots v_{n-1} x$.

Then $v v_2 \cdots v_{n-1} x$ is a path in $G - (u,v)$ and therefore $v$ and $x$ are in the same component. But recall that the choice of $x \not \in X$ was arbitrary. This means that there exists a path from $y$ to every vertex $x \not \in X$. Therefore, $G - (u,v)$ has two components, and $u$ and $v$ are in different components. $\tag*{$\Box$}$

We can use this to get the following important characterization of bridges.

Theorem 19.10. Let $G$ be a graph. An edge $e$ of $G$ is a bridge if and only if it is not contained in any cycle of $G$.

Proof. For the forward direciton, we assume that $e = (u,v)$ is a bridge. Suppose that $e$ is in a cycle, say $u v v_2 \cdots v_{n-1} u$. Now consider $G - e$. Note that $v v_2 \cdots v_{n-1} u$ is a path between $u$ and $v$, so $u$ and $v$ are in the same component. But this contradicts Lemma 19.9. Therefore, $e$ is not in a cycle of $G$.

For the reverse direction, we assume that $e = (u,v)$ is not in a cycle. Suppose that $e$ is not a bridge. Then $u$ and $v$ are in the same component in the graph $G - e$. Therefore, there is a path between $u$ and $v$, say $u v_2 \cdots v_{n-1} v$. However, this means that $u v_2 \cdots v_{n-1} vu$ is a cycle in $G$, which is a contradiction. Therefore, $e$ is a bridge. $\tag*{$\Box$}$

Corollary 19.11. Let $G = (V,E)$ be a graph and let $u,v \in V$. If there are two distinct paths between $u$ and $v$, then $G$ contains a cycle.

Proof. Consider two paths $P_1$ and $P_2$ between vertices $u$ and $v$. We can write $P_1 = a_0 a_1 \cdots a_m$ and $P_2 = b_0 b_1 \cdots b_n$, where $a_0 = b_0 = u$ and $a_m = b_n = v$, with $m$ not necessarily equal to $n$. Since the two paths are distinct, there is an edge that is in $P_1$ that is not in $P_2$. Let $(a_i,a_{i+1})$ be this edge.

Consider the graph $G - (a_i,a_{i+1})$. Now, consider the walk $$a_i a_{i-1} \cdots a_0 = u = b_0 b_1 \cdots b_n = v = a_n a_{n-1} \cdots a_{i+1}.$$ This is a walk from $a_i$ to $a_{i+1}$ which does not use the edge $(a_i, a_{i+1})$. Therefore, there $a_i$ and $a_{i+1}$ are in the same component and the edge $(a_i,a_{i+1})$ participates in a cycle. Therefore, $G$ contains a cycle. $\tag*{$\Box$}$

Trees

Trees are another special kind of graph with various interesting properties. This makes them interesting from a mathematical point of view (i.e. for studying) and these properties lend themselves to "useful" representations of data (i.e. for use in applications).

Typically, we think of trees when they're organized with the root at the top and growing downwards (apparently trees grow upside-down) as in family trees or organizational charts. However, the following definition gives us something more concrete to work with and we'll see that a lot of the results we've proven about cycles will come into play.

Definition 20.1. A graph $G$ is a tree if $G$ is connected and contains no cycles. A graph $G$ with no cycles is a forest.

Example 20.2. trees.

Theorem 20.3. A graph $G$ is a tree if and only if there is a unique path between any two vertices in $G$.

Proof. Suppose $G$ is a tree. Then $G$ is connected and has no cycles. Consider a pair of vertices $u, v$. Since $G$ is connected, there is a path between $u$ and $v$. Since $G$ contains no cycles, this path must be unique. Otherwise, if there were a second path, we can form a cycle by Corollary .

Now suppose that every pair of of vertices $u$ and $v$ in $G$ has a unique path between them. Then $G$ is connected. Now, suppose for contradiction that $G$ contains a cycle and $u$ and $v$ are part of this cycle. The cycle exhibits two paths between $u$ and $v$, contradicting our assumption that there is a unique path between $u$ and $v$. Thus, $G$ does not contain any cycles. Therefore, $G$ is a tree. $\tag*{$\Box$}$

Theorem 20.4. A tree $T$ with $n$ vertices has $n-1$ edges.

Proof. We will show this by mathematical induction on $n$.

Base case. We have $n = 1$, and therefore, our graph contains $0 = 1 - 1$ edges, so our statement holds.

Inductive case. Let $k \in \mathbb N$ and assume that every tree $T'$ with $k$ vertices has $k-1$ edges. Now, consider a tree $T = (V,E)$ with $k+1$ vertices. Choose a leaf $v$ and consider its parent $u$. Consider the subgraph of $S = (V \setminus \{v\}, E \setminus \{(u,v)\})$ without $v$. Then $S$ has $k$ vertices. By the inductive hypothesis, $S$ has $k-1$ edges. Since we only removed one edge from $T$ to obtain $S$, $T$ must have $k = (k+1)-1$ edges as desired. $\tag*{$\Box$}$

Theorem 20.5. Let $G = (V,E)$ be a tree. Then $G$ is bipartite.

Proof. We will show this by induction on the size of the vertex set $n = |V|$.

Base case. We have $n = 1$. Then we can partition our vertex set into $X = \{v\}$ and $Y = \emptyset$.

Inductive case. Now let $k \in \mathbb N$ and suppose all trees with $|V| = k$ are bipartite. Consider a graph $G = (V,E)$ with $|V| = k+1$. Consider a vertex $v \in V$ of degree 1. Then it is adjacent to a vertex $u$. Consider the subgraph $G' = G - v$, with $v$ and $(u,v)$ removed. Then $G'$ has $k$ vertices.

By our inductive hypothesis, $G'$ is bipartite with bipartition $(X',Y')$. Without loss of generality, suppose that $u \in X'$. Then we can form the bipartition $(X,Y)$ on the graph $G$ by $X = X'$ and $Y = Y' \cup \{v\}$. $\tag*{$\Box$}$

Binary trees

Very often, in computer science, we are concerned with binary trees as fundamental structures. These trees have a specific form which gives rise to even more special properties.

Definition 20.6. A rooted tree is a tree in which one vertex has been designated as the root. Let $T = (V,E)$ be a rooted tree and let $r$ be the root of $T$.

If $v \in V$ and $v \neq r$, then the parent of $v$ is its unique neighbour $u$ which is closer to $r$. The other neighbours of $v$ are its children. If $u$ and $v$ have the same parent, then they are siblings
An ancestor of $v$ is a vertex on the path between $r$ and $v$. A descendant of $v$ is a vertex $u$ such that $v$ is on the path between $r$ and $u$.
A leaf is a vertex with no children. If a vertex has children, it is an internal node.
Let $v$ be a vertex of $T$. The subtree with $v$ as its root is the subgraph of $T$ consisting of $v$ and its descendants and all incident edges.
A rooted tree is an $n$-ary tree if every internal vertex has at most $n$ children. An $n$-ary tree is full if every internal node has exactly $n$ children. A 2-ary tree is called a binary tree.
The height of a rooted tree is the number of vertices on the longest path between $r$ and a leaf.

For simiplicity, we will restrict our attention to binary trees. However, many of these results can generalize quite easily to $n \gt 2$. The notion of a subtree is quite useful for proving properties of rooted trees by induction.

Theorem 20.7. Let $G = (V,E)$ be a binary tree with height $h \in \mathbb N$. Then $|V| \leq 2^h - 1$.

Proof. We will prove this by strong induction on $h$, the height of the binary tree.

Base case. We have $h = 1$. In this case, the binary tree of height 1 consists only of a root. Then we have $|V| = 1 \leq 2^h - 1$.

Inductive case. Let $k \in \mathbb N$ and suppose all binary trees of height $j \leq k$ have at most $2^j - 1$ vertices. Consider a binary tree $G = (V,E)$ with height $k+1$.

Let $r$ be the root of $G$. Consider the subtrees rooted at the children of $r$, say $T_1 = (V_1,E_1)$ and $T_2 = (V_2,E_2)$. Both $T_1$ and $T_2$ have height at most $k$. Then by our inductive hypothesis, $|V_1| \leq 2^k - 1$ and $|V_2| \leq 2^k - 1$. Then we get \begin{align*} |V| &= |V_1| + |V_2| + 1 \\ &= (2^k - 1) + (2^k - 1) + 1 \\ &= 2 \cdot (2^k - 1) + 1 \\ &= 2^{k+1} - 2 + 1 \\ &= 2^{k+1} - 1 \end{align*} $\tag*{$\Box$}$

Spanning trees

Now, we can apply trees to the notion of connectivity. We can think of this as finding a minimal connected subgraph of the graph such that every vertex is connected. One can see how this might be useful in something like road network where you're trying to figure out what the most important roads to clear after a snowfall are.

Definition 20.8. A spanning subgraph which is also a tree is called a spanning tree.

Theorem 20.9. A graph $G$ is connected if and only if it has a spanning tree.

Proof. Suppose $G$ has a spanning tree $T$. Then there exists a path between every pair of vertices in $T$ and therefore, there exists a path between every pair of vertices in $G$. Then by definition, $G$ is connected.

Now, suppose $G$ is connected. Consider the connected spanning subgraph $H$ of $G$ with the minimal number of edges. Suppose $H$ contains a cycle. Then we can remove an edge from this cycle, since no edge in a cycle is a bridge. Therefore, $H$ contains no cycles and since $H$ is connected, it is a tree. $\tag*{$\Box$}$

This result provides another way to determine the connectivity of a graph, assuming that we have an efficient way to determine the spanning tree for a given graph. Of course, that is an algorithmic question that you can look forward to in an algorithms class.

Maximum matchings in bipartite graphs

So now that we know a bit graph theory, we can go back and think about matchings again. Recall that maximal matchings are not necessarily maximum matchings. We already have a condition of Hall for complete matchings, but what if we don't have a graph that exhibits complete matchings? We would like a way to figure out if, when we find ourselves with a maximal matching, whether there's a way to get a better matching or if such a matching exists. We will make use of the following idea.

Definition 21.1. Let $G$ be a graph with a matching $M$. An augmenting path for $M$ is a path $v_0 v_1 \cdots v_k$ such that $v_0$ and $v_k$ are unmatched in $M$ (i.e. $v_0, v_k$ do not belong to any matched edges) and the edges of the path alternate between edges in $M$ and edges not in $M$.

Let's have a look at what an augmenting path looks like.

Example 21.2. Consider again Example 17.21:

We can see on the left, where we have a maximal but not maximum matching, there are many augmenting paths, while there are no such paths on the right. Now, observe that we can choose an augmenting path in the left-hand matching and if we flip all the edges in and out of the matching, we end up with the right-hand matching.

This gives us a possible algorithm: start with an unmatched vertex and construct a matching iteratively by finding augmenting paths and flipping them. This is the idea behind Berge's lemma.

Theorem 21.3 (Berge's Lemma). Let $G = (V,E)$ be a graph with a matching $M$. Then $M$ is a maximum matching if and only if there is no $M$-augmenting path.

However, there is one possible complication with this approach. Consider the following graph and matching.

Observe that if we attempt to find our augmenting path by starting from the leftmost vertex, we could end up getting stuck, depending on which way we travel on the cycle.

Now, you might say, hold on, this graph is not bipartite! And that's true, this is not a bipartite graph. But, if we look carefully at the definition of matchings and the statement of Berge's lemma, we'll find that neither of these actually depends on the fact that $G$ is bipartite!

However, this issue with our algorithm is only a problem because the cycle that's giving us the problem has odd length. Luckily, this is not a problem for bipartite graphs.

Theorem 21.4. A graph $G$ is bipartite if and only if $G$ does not contain a cycle of odd length.

Proof. First, we show the only if direction. Suppose $G$ is bipartite and consider a cycle in $G$. Then traversing an edge on this cycle means alternating between the two sets of the bipartition. Since we must return the starting vertex, we must end up in the same set of the bipartition that we started in, which means that the cycle must have an even number of edges.

Now, we consider the if direction and suppose that $G$ does not contain a cycle of odd length. We assume without loss of generality that $G$ is connected, since $G$ is bipartite if and only if all of its connected components are bipartite.

Choose a vertex $v \in V$. We define two sets, \begin{align*} V_1 &= \{u \in V \mid \text{there exists an odd length path between $u$ and $v$}\}, \\ V_2 &= \{u \in V \mid \text{there exists an even length path between $u$ and $v$}\}. \\ \end{align*} We observe that $V_1 \cup V_2 = V$, since $G$ is connected, and so there must be a path between $v$ and each vertex in $G$.

We claim that $V_1 \cap V_2 = \emptyset$. Suppose otherwise and that there exists a vertex $u \in V_1 \cap V_2$. Then there exists a path of even length from $u$ to $v$ and a path of odd length from $v$ to $u$. These two paths together form a cycle of odd length, which contradicts our assumption that no such cycles exist.

Since $V_1$ and $V_2$ are disjoint and cover $V$, they form a bipartition of $V$, and therefore, $G$ is bipartite. $\tag*{$\Box$}$

So for bipartite graphs, the algorithm that we've described works perfectly fine. But what about for general graphs? It takes a bit more work, but the same idea works once you can figure out how to deal with odd cycles. This result is due to Jack Edmonds in 1965, who gave the first efficient algorithm for computing maximum matchings in general graphs. Jack Edmonds was a faculty member at the University of Waterloo, where I did my undergrad, and he was a member of the Department of Combinatorics and Optimization. A fun fact about Edmonds is that he has a big rock on his front lawn that has $\mathrm{NP} \neq \mathrm{NP} \cap \mathrm{coNP} = \mathrm{P}$ inscribed on it.

Nowadays, the maximum matching problem is usually solved by using the Ford-Fulkerson maximum flow algorithm, which has a running time of $O(|V||E|)$. This is for all graphs; if one restricts consideration to bipartite graphs, we can use a faster algorithm due to Hopcroft and Karp, which has a running time of $O(\sqrt{|V|}|E|)$.