MPCS 50103 — Lecture 8

Variance

The question of how near or far values of $X$ can be from the expectation leads us to the following notion.

Definition 16.1. Let $X$ be a random variable on the probability space $(\Omega,\Pr)$. The variance of $X$ is $$V(X) = \sum_{\omega \in \Omega} (X(\omega) - E(X))^2 \cdot \Pr(\omega).$$ The standard deviation of $X$ is $\sigma(X) = \sqrt{V(X)}$. Because of this, variance is sometimes denoted by $\sigma^2(X)$.

Again, the definition of variance is in terms of the probabilty of events from $\Omega$, which is not ideal. Luckily, we can reformulate variance in terms of the expectation of $X$ only.

Theorem 16.2. If $X$ is a random variable, then $V(X) = E(X^2) - E(X)^2$.

Proof. \begin{align*} V(X) &= \sum_{\omega \in \Omega} (X(\omega) - E(X))^2 \cdot \Pr(\omega) \\ &= \sum_{\omega \in \Omega} (X(\omega)^2 - 2X(\omega)E(X) + E(X)^2) \cdot \Pr(\omega) \\ &= \sum_{\omega \in \Omega} X(\omega)^2 \cdot \Pr(\omega) - 2 E(X) \sum_{\omega \in \Omega} X(\omega) \cdot \Pr(\omega) + E(X)^2 \sum_{\omega \in \Omega} \Pr(\omega) \\ &= E(X^2) - 2E(X)^2 + E(X)^2 \\ &= E(X^2) - E(X)^2 \end{align*} $\tag*{$\Box$}$

Corollary 16.3. $V(X) = E((X - E(X))^2)$.

Proof. \begin{align*} E((X - E(X))^2) &= E(X^2 - 2 X E(X) + E(X)^2) \\ &= E(X^2) - E(2XE(X)) + E(E(X)^2) \\ &= E(X^2) - 2E(X)^2 + E(X)^2 \\ &= E(X^2) - E(X)^2 \\ &= V(X) \end{align*} $\tag*{$\Box$}$

Example 16.4. Recall the roll of a fair 6-sided die and that $E(X) = 3.5$. To compute the variance, we need $E(X^2)$. For $i = 1, 2, \dots, 6$, $X^2 = i^2$ with probability $\frac 1 6$. Then $$E(X^2) = \sum_{i=1}^6 \frac{i^2}{6} = \frac{91}6.$$ Then we can calculate $V(X)$ by $$V(X) = E(X^2) - E(X)^2 = \frac{91}{6} - \left( \frac 7 2 \right)^2 = \frac{35}{12} \approx 2.92.$$ From this, we can also get the standard deviation, $\sqrt{V(X)} \approx 1.71$.

Now, what if we wanted to consider the roll of more than one die? Recall that the result of mulitple die rolls is independent and so their corresponding random variables are independent. We can make use of this property via Bienaymé's formula. Irénée-Jules Bienaymé was a French mathematician at around the same time as Markov and Chebyshev and was friends and collaborators with Chebyshev.

Theorem 16.5 (Bienaymé's Formula). If $X$ and $Y$ are independent random variables, then $V(X+Y) = V(X) + V(Y)$.

Proof. We will make use of the claim that if $X$ and $Y$ are independent, then $E(XY) = E(X)E(Y)$ (this is an exercise on this week's problem set). \begin{align*} V(X+Y) &= E((X+Y)^2) - E(X+Y)^2 \\ &= E(X^2) + 2E(XY) + E(Y^2) - E(X)^2 - 2E(X)E(Y) - E(Y)^2 \\ &= E(X^2) - E(X)^2 + E(Y^2) - E(Y)^2 + 2E(XY) - 2E(X)E(Y) \\ &= E(X^2) - E(X)^2 + E(Y^2) - E(Y)^2 \\ &= V(X) + V(Y) \end{align*} $\tag*{$\Box$}$

Example 16.6. Recall that the variance of a single die roll is $V(X) = \frac{35}{12}$. Then the variance of two die rolls is $V(X) + V(X) = \frac{35}{6} \approx 5.833$ and the standard deviation is about $2.42$.

Example 16.7. Consider a series of $n$ independent Bernoulli trials with probability of success $p$. What is the variane of the number of successes?

If we have a series of trials $(a_1, a_2, \dots, a_n)$, then we define the random variable $X_i$ to be 1 if $a_i$ is a success and 0 otherwise. Then we define $X = X_1 + \cdots + X_n$, so that $X$ counts the number of successes in the $n$ trials. Then the variance is simply $V(X) = V(X_1) + \cdots + V(X_n)$.

To compute this, we need to compute the variance of a single Bernoulli trial. If $X_i$ is our random variable for the Bernoulli, trial, we have that $X_i$ takes on a value of either 0 or 1. Therefore, we have $X_i^2 = X_i$ and $E(X_i) = 1 \cdot p + 0 \cdot (1-p) = p$. Then $$V(X_i) = E(X_i^2) - E(X_i)^2 = p - p^2 = p \cdot (1-p).$$ This gives us, we have $V(X) = np(1-p)$.

Now, we will revisit the question of how likely it is that a random variables takes a value that strays far from its expectation. Much like Markov's inequality, the following result will give us a bound on how much a random variable differs from its expected value. Chebyshev's inequality is named for Pafnuty Chebyshev, who we've already mentioned before.

Theorem 16.8 (Chebyshev's Inequality). Let $X$ be a random variable. For all $a \gt 0$, $$\Pr(|X - E(X)| \geq a) \leq \frac{V(X)}{a^2}.$$

Proof. Let $Y = (X - E(X))^2$ be a random variable. Then $E(Y) = V(X)$ by definition. We can apply Markov's inequality, since $Y$ is non-negative, to get $$\Pr(Y \geq a^2) \leq \frac{E(Y)}{a^2}.$$ Since $Y = (X-E(X))^2$, consider $(X-E(X))^2 \geq a^2$. We have $(X-E(X))^2 \geq a^2$ if and only if $|X - E(X)| \geq a$. Then, together with $E(Y) = V(X)$, we substitute into the above to get $$\Pr(|X-E(X)| \geq a) \leq \frac{V(X)}{a^2}.$$ $\tag*{$\Box$}$

Example 16.9. Let $X$ be the random variable for the number of heads when a fair coin is tossed $n$ times. Recall that this is just the number of successful outcomes for $n$ independent Bernoulli trials with $p = \frac 1 2$. One can calculate that $E(X) = \frac n 2$ and from the previous example, we have that $V(X) = \frac n 4$ (since $p = (1-p) = \frac 1 2$).

First, let's consider an estimate by Markov's inequality. Recall that we have $\Pr(X \geq a) \leq \frac{E(X)}{a}$. If we set $a = 60$ and $n = 100$, we get an upper bound on the probability that we get more than 60 heads in 100 coin flips. This gives us $$\Pr(X \geq 60) \leq \frac{\frac{100}2}{60} = \frac 5 6.$$

Now, we apply Chebyshev's inequality, using $a = \sqrt n$ to get $$\Pr\left( \left|X - \frac n 2 \right| \geq \sqrt n \right) \leq \frac{V(X)}{\sqrt n^2} = \frac n 4 \cdot \frac 1 n = \frac 1 4.$$

What this says is there is at most a $\frac 1 4$ probability that the number of heads that we get differs from the expected number ($\frac 1 2)$ by more than $\sqrt n$. So, if we make 100 rolls, then we have $$\Pr(|X - 50| \geq 10) \leq \frac 1 4.$$ In other words, the probability that we get more than 60 heads or less than 40 heads is at most $\frac 1 4$, which is significantly better than our estimate from Markov's inequality. Of course, the choice of $a$ (or $a^2$) is ours, depending on what kind of threshold we want to know the probability for.

Graph Theory

Something that we've run into despite my best efforts to shield you from it until now so I could avoid getting into specifics is the notion of graphs. You may be familiar with graphs as either graphical charts of data or from calculus in the form of visualizing functions in some 2D or 3D space. Neither of these are the graphs that are discussed in graph theory.

We saw two examples where graphs snuck in. The first is from our application of the pigeonhole principle to a problem about whether two people know each other or don't know each other at a party. The second is from the example of our ambitious travelling salesperson, who wants to visit as many cities as possible with the least cost.

Both of these examples model relationships between sets of objects. In the first case, we are modelling what's basically a social network. Our objects are people and the relationships are described as: two people are connected by either a "knows each other" relationship or a "doesn't know each other" relationship. In the second case, our objects are cities and their relationship is described not only by a connection, but by a weight or cost on that connection, representing the cost to travel between the two cities.

Definition 17.1. A graph $G = (V,E)$ is a pair with a non-empty set of vertices $V$ together with a set of edges $E$. An edge $e \in E$ is an unordered pair $(u,v)$ of vertices $u,v \in V$.

This graph is called the Petersen graph. The Petersen graph is a particularly interesting graph, not just because it looks neat, but because it happens to be a common counterexample for many results in graph theory.

Note that, since edges are unordered, we really should be writing them as $\{u,v\}$, but we don't.

One can define graphs more generally and then restrict consideration to our particular definition of graphs, which would usually be then called simple graphs. The more general definition of graphs allows for things like multiple edges between vertices or loops. Other enhancements may include asymmetric edges (which we call directed edges), equipping edges with weights, and even generalizing edges to hyperedges that relate more than two vertices. Instead, what we do is start with the most basic version of a graph, from which we can augment our definition with the more fancy features if necessary (it will not be necessary in this course).

Definition 17.2. Two vertices $u$ and $v$ of a graph $G$ are adjacent if they are joined by an edge $(u,v)$. We say the edge $(u,v)$ is incident to $u$ and $v$.

Definition 17.3. Two vertices $u$ and $v$ are neighbours if they are adjacent. The neighbourhood of a vertex $v$ is the set of neighbours of $v$ $$N(v) = \{u \in V \mid (u,v) \in E\}.$$ We can extend this definition to sets of vertices $A \subseteq V$ by $$N(A) = \bigcup_{v \in A} N(v).$$

Definition 17.4. The degree of a vertex $v$ in $G$, denoted $\deg(v)$, is the number of its neighbours. A graph $G$ is $k$-regular if every vertex in $G$ has degree $k$.

Observe that every vertex in the Petersen graph has degree 3, so the Petersen graph is 3-regular, or cubic.

The following results are some of the first graph theory results, proved by Leonhard Euler in 1736.

Theorem 17.5 (Handshaking Lemma). Let $G = (V,E)$ be an undirected graph. Then $$\sum_{v \in V} \deg(v) = 2 \cdot |E|.$$

Proof. Every edge $(u,v)$ is incident to exactly two vertices: $u$ and $v$. Therefore, each edge contributes 2 to the sum of the degrees of the vertices in the graph. $\tag*{$\Box$}$

Corollary 17.6. Let $G$ be a graph. Then the number of vertices of odd degree in $G$ is even.

Proof. Let $V_1, V_2 \subseteq V$ be the sets of odd and even vertices of $G$, respectively. Then $$2|E| = \sum_{v \in V_1} \deg(v) + \sum_{v \in V_2} \deg(v).$$ We observe that since $\deg(v)$ for $v \in V_2$ is even, then the sum $\sum_{v \in V_2} \deg(v)$ is even. Then since $2|E|$ is even, this must mean that the sum $\sum_{v \in V_1} \deg(v)$ is even. But since $\deg(v)$ is odd for $v \in V_1$, this implies that $|V_1|$ must be even. $\tag*{$\Box$}$

It's the corollary that gives the name for the handshaking lemma. When rephrased as a bunch of people shaking hands, the lemma says that there must be an even number of people who have shaken an odd number of hands.

Definition 17.7. The complete graph on $n$ vertices is the graph $K_n = (V,E)$, where $|V| = n$ and $$E = \{(u,v) \mid u,v \in V\}.$$ That is, every pair of vertices are neighbours.

Definition 17.8. The cycle graph on $n$ vertices is the graph $C_n = (V,E)$ where $V = \{v_1, v_2, \dots, v_n\}$ and $E = \{(v_i,v_j) \mid j = i + 1 \bmod n\}$. It's named so because the most obvious way to draw the graph is with the vertices arranged in order, on a circle.

Definition 17.9. The $n$-(hyper)cube graph is the graph $Q_n = (V,E)$, where $V = \{0,1\}^n$ (i.e., binary strings of length $n$) and $(u,v)$ is an edge if, for $u = a_1 a_2 \cdots a_n$ and $v = b_1 b_2 \cdots b_n$, there exists an index $i, 1 \leq i \leq n$ such that $a_i \neq b_i$ and $a_j = b_j$ for all $j \neq i$. In other words, $u$ and $v$ differ in exactly one symbol.

It's called a cube (or hypercube) because one of the ways to draw it is to arrange the vertices and edges so that they form the $n$th-dimensional cube.

Graph isomorphism

Consider the following graphs.

These two graphs happen to be the "same". Furthermore, it turns out that these are also basically the same graph as $Q_3$, or at least, we feel that it should be. This means we don't need to necessarily draw a $3$-cube like a cube (and we certainly wouldn't expect this for $n \gt 3$).

Although the visual representation of the graph is very convenient, it is sometimes misleading, because we're limited to 2D drawings. In fact, there is a vast area of research devoted to graph drawing and the mathematics of embedding graphs in two dimensions. The point of this is that two graphs may look very different, but turn out to be the same.

But even more fundamental than that, we intuitively understand that just because the graph isn't exactly the same (i.e., the vertices are named something differently) doesn't mean that we don't want to consider them equivalent. So how do we discuss whether two graphs are basically the same, just with different names or drawings?

Definition 17.10. An isomorphism between two graphs $G_1 = (V_1,E_1)$ and $G_2 = (V_2,E_2)$ is a bijection $f:V_1 \to V_2$ which preserves adjacency. That is, for all vertices $u,v \in V_1$, $u$ is adjacent to $v$ in $G$ if and only if $f(u)$ and $f(v)$ are adjacent in $G_2$. Two graphs $G_1$ and $G_2$ are isomorphic if there exists an isomorphism between them, denoted $G_1 \cong G_2$.

In other words, we consider two graphs to be the "same" or equivalent if we can map every vertex to the other graph such that the adjacency relationship is preserved. In practice, this amounts to a "renaming" of the vertices, which is what we typically mean when we say that two objects are isomorphic. Of course, this really means that the edges are preserved.

Example 17.11. Consider the following graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$.

We will show that these two graphs are isomorphic by defining a bijective function $f:V_1 \to V_2$ which preserves adjacency: \begin{align*} f(a_0) &= u_0 & f(a_1) &= v_3 & f(a_2) &= u_2 & f(a_3) &= v_1 \\ f(b_0) &= v_2 & f(b_1) &= u_1 & f(b_2) &= v_0 & f(b_3) &= u_3 \\ \end{align*} This is a bijection, since every vertex in $V_1$ is paired with a vertex in $V_2$. We then verify that this bijection preserves adjacency by checking each edge under the bijection: \begin{align*} (f(a_0),f(b_1)) &= (u_0,u_1) & (f(a_0),f(b_2)) &= (u_0,v_0) & (f(a_0),f(b_3)) &= (u_0,u_3) \\ (f(a_1),f(b_0)) &= (v_3,v_2) & (f(a_1),f(b_2)) &= (v_3,v_0) & (f(a_1),f(b_3)) &= (v_3,u_3) \\ (f(a_2),f(b_0)) &= (u_2,v_2) & (f(a_2),f(b_1)) &= (u_2,u_1) & (f(a_2),f(b_3)) &= (u_2,u_3) \\ (f(a_3),f(b_0)) &= (v_1,v_2) & (f(a_3),f(b_1)) &= (v_1,u_1) & (f(a_3),f(b_2)) &= (v_1,v_0) \end{align*}

We can take the idea that the vertices and their adjacency defines a graph and come up with a way represent a graph using only this information.

Definition 17.12. The adjacency matrix $A$ of a graph $G = (V,E)$ with $V = \{v_1, v_2, \dots, v_n\}$ is an $n \times n$ matrix such that $$A_{i,j} = \begin{cases} 1 &\text{if $(v_i, v_j) \in E$,} \\ 0 &\text{otherwise.} \end{cases}$$

So in the above graph $G_1$, we arbitrarily assign rows/columns 1 through 8 by $a_0, a_1, a_2, a_3, b_0, b_1, b_2, b_3$, and we get the following adjacency matrix, $$\begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}$$ Then, if we apply our bijection $f$, we end up with the same adjacency matrix for $G_2$.

What is neat about the adjacency matrix is that it's easily extended to fancier definitions of graphs if we wish. For instance, if we want to consider graphs with multiple edges, we can make this a matrix with integer entries instead of just 0 and 1. Or if we want to represent a directed graph, then we can make the $(i,j)$th entry correspond to the edge $(i,j)$, which would not have to be symmetric with the $(j,i)$th entry.

So an obvious and fundamental problem that arises from this definition is: Given two graphs $G_1$ and $G_2$, are they isomorphic? The obvious answer is that we just need to come up with an isomorphism or show that one doesn't exist. Unfortunately, there are $n!$ possible such mappings we would need to check.

There are a few ways we can rule out that graphs aren't the same, via properties that we call isomorphism invariants. There are properties of the graph that can't change via an isomorphism, like the number of vertices or edges in a graph.

Definition 17.13. A $k$-colouring of a graph is a function $c : V \to \{1,2,\dots,k\}$ such that if $(u,v) \in E$, then $c(u) \neq c(v)$. That is, adjacent vertices receive different colours. A graph is $k$-colourable if there exists a $k$-colouring for $G$.

We will use $k$-colourability as an invariant to show that two graphs are not isomorphic.

Example 17.14. Consider the following two graphs, $G_1$ and $G_2$.

First, we observe that both graphs have 6 vertices and and 9 edges and both graphs are 3-regular. However, we will show that $G_1$ is 2-colourable and $G_2$ is not. To see that $G_1$ is 2-colourable, we observe that the vertices along the top row can be assigned the same colour because they are not adjacent to each other. Similarly, the vertices along the bottom can be assigned a second colour.

However, $G_2$ contains two triangles and by definition, the vertices belonging to the same triangle must be coloured with 3 different colours. Since $G_2$ is not 2-colourable, the adjacency relationship among the vertices is not preserved.

What we would like is some easy to compute graph invariants that tell us that we do have an isomorphism, so we don't have to search over $n!$ different possible mappings. Unfortunately, we don't know of any that exist, so graph isomorphism remains difficult to solve. The question of exactly how difficult of a problem it is is also still an open question. First, let's define the notion of a subgraph.

Definition 17.15. A graph $G' = (V',E')$ is a subgraph of a graph $G = (V,E)$ if $V' \subseteq V$ and $E' \subseteq E$. The subgraph induced by a subset $V' \subseteq V$ is the largest subgraph of $G$ with vertex set $V'$. We say that $G'$ is a spanning subgraph if $V' = V$.

Example 17.16. Consider the graphs from Example 22.5. One graph contains $K_3$ as a subgraph (or alternatively contains a subgraph isomorphic to $K_3$), while the other does not.

The subgraph isomorphism problem asks whether given graphs $G$ and $H$ whether there is a subgraph of $G$ that is isomorphic to $H$. This problem is known to be hard to solve: it's NP-complete, which means that if we have a solution for the problem, we can check it efficiently, but we have no efficient algorithms for solving it. It's not hard to see that graph isomorphism is a special case of subgraph isomorphism.

Bipartite graphs

Consider again our graphs from Example 17.14. Since we showed the graph was 2-colourable, this amounted to dividing the vertices up into two sets (or a partition) so that all of the edges of the graph only travel between these two sets. This gives us another special type of graph.

Definition 17.17. A graph $G = (V,E)$ is bipartite if $V$ is the disjoint union of two sets $V_1$ and $V_2$ such that all edges in $E$ connect a vertex in $V_1$ with a vertex in $V_2$. We say that $(V_1, V_2)$ is a bipartition of $V$.

We get the following characterization of bipartite graphs for free from the definition of 2-colourability.

Theorem 17.18. A graph $G$ is bipartite if it is 2-colourable.

Similar to the idea of complete graphs, we can define complete bipartite graphs.

Definition 17.19. The complete bipartite graph on $m$ and $n$ vertices is the graph $K_{m,n} = (V,E)$, where $(V_m,V_n)$ is a bipartition of $V$ with $|V_m| = m$ and $|V_n| = n$, and $$E = \{(u,v) \mid u \in V_m, v \in V_n \}.$$

The bipartite graph from Example 17.14 is $K_{3,3}$.

Bipartite graphs come up naturally in a lot of applications because a bipartite graph models relationships between two groups of things: job applicants and potential employers, PhD students and PhD advisors, pigeons and pigeonholes, etc. In many of these applications, we would like to assign or match an object from the first set with an object from the second. This leads to the following notion.

Definition 17.20. A matching in a graph $G = (V,E)$ is a subset of edges $M \subseteq E$ such that no two edges in $E$ share the same incident vertex. A vertex that is incident to an edge in a matching $M$ is matched in $M$; otherwise, it is unmatched. A maximal matching is a matching that cannot be made larger; in other words, every edge in $E \setminus M$ is incident to a vertex matched in $M$. A maximum matching is a matching with the maximum number of edges. A matching $M$ in a bipartite graph $G = (V,E)$ with bipartition $(V_1, V_2)$ is a complete matching from $V_1$ to $V_2$ if every vertex in $V_1$ is matched in $M$.

The above definitions distinguish between maximal and maximum matchings. Just because you have a matching that can't be expanded doesn't mean that you necessarily have the largest possible matching!

Example 17.21. Consider the following graph and the highlighted matchings.

On the left, we have a maximal matching, since no edges can be added to the matching without violating the matching condition. However, this graph can exhibit a larger matching, which we have on the right. Since every vertex on the bottom is matched, this is a complete matching, and therefore, it is also a maximum matching. However, other maximum matchings exist for this graph.

The following result gives us a necessary and sufficient condition for a bipartite graph to have a complete matching and is due to Philip Hall, who was a British group theorist and worked at Bletchley Park during the Second World War.

Theorem 17.22 (Hall's Matching Theorem). Let $G = (V,E)$ be a bipartite graph with bipartition $(V_1,V_2)$. Then $G$ has a complete matching from $V_1$ to $V_2$ if and only if $|N(A)| \geq |A|$ for all subsets $A \subseteq V_1$.

Proof. First, consider the only if direction. Here, we suppose there is a complete matching $M$ from $V_1$ to $V_2$. Consider a subset $A \subseteq V_1$. Since $M$ is a complete matching, we know that there is at least one edge (the one in $M$) for each vertex $v \in V_1$. This means there is at least one neighbour in $V_2$ for each vertex $v \in V_1$. Therefore, $|N(A)| \geq |A|$.

The if direction is proved by strong induction on $|V_1|$.

Base case. We have $|V_1| = 1$, so $V_1 = \{v\}$. Since $|N(V_1)| \geq |V_1|$, there exists an edge $(v,u)$ with $u \in V_2$. The complete matching consists of this edge.

Inductive case. Let $k \gt 1$ and assume for a graph $G = (V,E)$ with bipartition $(V_1,V_2)$ that if $|V_1| = j \leq k$, then there is a complete matching from $V_1$ to $V_2$ if $|N(A)| \geq |A|$ for all subsets $A \subseteq V_1$.

Consider a graph $G' = (V',E')$ with bipartition $(V_1',V_2')$ and $|V_1'| = k+1$. There are two cases to consider.

In the first case, we have that for all subsets $A \subseteq V_1'$ of size $j$ with $1 \leq j \leq k$, $|N(A)| \gt |A| = j$. In this case, we choose a vertex $v \in V_1'$ and a vertex $u \in V_2'$ that is adjacent to $v$. Consider the subgraph $H$ that results when we delete $v$ and $u$ and all edges incident to them. This graph has a bipartition $(V_1' \setminus \{v\}, V_2' \setminus \{u\})$. Since $|V_1' \setminus \{v\}| = k$, by the inductive hypothesis, $H$ has a complete matching $M$. Then we can construct a complete matching $M'$ for $G'$ by taking $M' = M \cup \{(v,u)\}$.

In the second case, there exists a subset $A \subseteq V_1'$ of size $j$ with $1 \leq j \leq k$ such that $|N(A)| = |A| = j$. Let $B = N(A)$. By the inductive hypothesis, there is a complete matching $M_1$ from $A$ to $B$. Now consider the subgraph $H$ that results from deleting $A$,$B$, and all edges incident to them.

The graph $H$ has a bipartition $(V_1' \setminus A, V_2' \setminus B)$. We claim that for all subsets $S \subseteq V_1' \setminus A$, we have $|N(S)| \geq |S|$. Assume to the contrary. Then there exists a subset $T \subseteq V_1' \setminus A$ of size $t$ with $1 \leq t \leq k+1-j$ such that $|N(T)| \lt |T| = t$. But then this implies that $|N(A \cup T)| \lt |A \cup T|$, which contradicts our original assumption.

Therefore, $|N(S)| \geq |S|$ for all subsets $S \subseteq V_1' \setminus A$. Since $|V_1' \setminus A| \leq k$, there exists a perfect matching $M_2$ from $V_1' \setminus A$ to $V_2' \setminus B$. Then, we can construct a complete matching $M'$ for $G'$ by taking $M' = M_1 \cup M_2$.

Thus, in both cases, we have constructed a complete matching. $\tag*{$\Box$}$

Obviously, not every bipartite graph will contain a complete matching. However, perhaps we would like to try to find a maximum matching. But if we just go and choose a matching, we may end up with a maximal matching. Can we be sure that we have a maximum matching and not just a maximal matching? We'll return to this soon.