MPCS 55001 — Lecture 16

Satisfiability

So you might say, okay, it makes sense to reduce Set Cover to Vertex Cover because they're both about covers and you can kind of generalize things to graphs. But what if we considered something that wasn't about covering at all?

Here, we will need to recall some definitions from propositional logic.

A boolean variable is a variable that can be assigned a value from $\{0,1\}$ (or $\{T,F\}$). A literal is a boolean variable or its negation. A clause is a disjunction of literals. We say a boolean formula $\varphi$ is in conjunctive normal form (CNF) if $\varphi$ is a conjunction of clauses.

Consider the formula \[\varphi = (\neg x_1 \vee x_2 \vee x_3) \wedge (x_1 \vee \neg x_2 \vee x_3) \wedge (\neg x_1 \vee x_2 \vee x_4).\] This formula is in conjunctive normal form and has

Recall that propositional formulas are logical statements that assert the truth of some statement. Usually this is mathematical in nature. We are interested in determining whether such a statement is ever true. In other words, is there a way for us to assign the variables in such a way that the formula becomes true? This is called the satisfiability problem.

The satisfiability (SAT) problem is: Given a propositional formula $\varphi$, does it have a satisfying truth assignment? The 3-SAT problem is: Given a propositional formula $\varphi$ in conjunctive normal form, where each clause contains exactly three literals, does it have a satisfying truth assignment?

Satisfiability is a basic problem in mathematical logic and has obvious applications in things like circuit design. However, as mentioned before, there are a number of mathematical statements that one can express in propositional logic, so being able to solve this would mean being able to determine the satisfiability of any statement that can be expressed as a propositional statement.

We will show something that is maybe a bit surprising.

3-SAT $\leq_P$ Independent Set.

Given an instance $\varphi$ of 3-SAT, where $\varphi$ has $k$ clauses, we will construct an instance $(G,k)$ of Independent set that has an independent set of size $k$ if and only if $\varphi$ is satisfiable.

Our construction is as follows. For each clause $C = x_i \vee x_j \vee x_k$ in $\varphi$, we construct a triangle. Then, we add an edge between each literal and its negations in the other clauses.

Consider the following graph, constructed by using the formula from Example 18.6. Note that each triangle corresponds to a clause and that the literals $x_1$ and $x_2$ have edges to their negations, $\neg x_1$ and $\neg x_2$.

We will show that $\varphi$ is satisfiable if and only if $G$ contains an independent set of size $k$, where $k$ is the number of clauses in $\varphi$.

First, suppose $\varphi$ is satisfiable and consider a satisfying assignment. This means that in every clause, at least one of the literals is true. Then we choose one of these literals from each clause and these form an independent set $S$.

To see that they do, it is clear that only one vertex can be chosen from each triangle. Then all that remains is that there are no edges joining two vertices in different triangles. But the only way this can happen is between two literals $x_i$ and $\neg x_i$, which can't be the case in a satisfying assignment. therefore, $S$ is an independent set.

Next, suppose that $G$ contains an independent set $S$ of size $k$. Since $S$ is independent, it must contain exactly one vertex from each triangle. We set the literals corresponding to these vertices to True, as long as they are consistent. Then this means every clause evaluates to true, so $\varphi$ is satisfied.

With that we have the following relationships among the problems we've discussed so far: \[\mathrm{3SAT} \leq_P \mathrm{Independent\ Set} \leq_P \mathrm{Vertex\ Cover} \leq_P \mathrm{Set\ Cover}.\]

Now, you might say that this seems like cheating: after all, we've just shown that a special version of boolean satisfiability can be turned into a graph problem. So to answer that question, we'll show that any boolean formula $\phi$ can be transformed into an equivalent formula in 3CNF.

Satisfiability $\leq_P$ 3-SAT.

We need to transform an arbitrary boolean formula $\varphi$ into one that is in CNF and has at most three literals per clause. To do this, we will go through a series of transformations, each of which preserves the truth of the original formula.

First, we will parenthesize the formula into what is sometimes called a well-formed formula. Essentially, this means rewriting the formula so that the order of operations are explicitly defined by the parentheses. It is clear that this rewriting doesn't change the formula.

Once we have such a formula, we can very easily construct a parse tree for the formula. For example, for the formula $\varphi = (((x_1 \to x_2) \wedge (x_3 \wedge \neg x_1)) \vee (x_2 \to \neg x_4)) \vee x_3$, we have the following parse tree.

We observe that in such a tree, the leaves are all literals and the internal nodes of the tree are all connectives. We observe that every connective has exactly two children. What we will do is, for each connective $(x \circ y)$, construct a clause $(z \leftrightarrow x \circ y)$, where $z$ is a new variable and replace the subformula with $z$. Since all connectives are binary, we have a set of clauses with three literals of the form $(z \leftrightarrow x \circ y)$.

For each of these clauses, we can rewrite them using only $\vee$ and $\wedge$ by using a truth table and writing the result in disjunctive normal form (ors of ands). Since all of our clauses are of the form $C = (z \leftrightarrow (x \circ y))$, we have a truth table

$x$ $y$ $z$ $C$
T T T ?
T T F ?
T F T ?
T F F ?
F T T ?
F T F ?
F F T ?
F F F ?

Then we can construct a formula in disjunctive normal form for $\neg C$ by taking each row that evaluates to $F$, $\wedge$ing the variables together and $\vee$ing those clauses.

For example, suppose $C = (z \leftrightarrow (x \wedge \neg y))$. Then

$x$ $y$ $z$ $(z \leftrightarrow (x \wedge y))$
T T T T
T T F F
T F T F
T F F T
F T T F
F T F T
F F T F
F F F T

This gives us the formula \[\neg C = (x \wedge y \wedge \neg z) \vee (x \wedge \neg y \wedge z) \vee (\neg x \wedge y \wedge z) \vee (\neg x \wedge \neg y \wedge z).\]

We then apply De Morgan's laws on our formula for $\neg C$ to turn the result into a formula for $C$ in conjunctive normal form. Again, this does not change the satisfiability of the formula.

This leaves us with clauses with three, two, or one literals. We can stuff the clauses with only one or two literals with dummy variables as follows:

where $p$ and $q$ are new variables. We use the same variables $p$ and $q$ for all clauses that need them. It is clear that adding these variables does not affect the satisfiability of the formula.

It remains to show that the construction can be done in polynomial time. It is clear that parenthesizing the formula can be done in one pass of the formula. Then what is the size of the parse tree? This looks like it could be exponential, but it's actually linear in the length of the formula: every node corresponds to either a literal or a connective in the formula. This is important, because we use the parse tree as the basis of our transformation.

Roughly speaking, we define a clause for each connective in our parse tree. This corresponds to one clause per internal node of the parse tree, which is linear in the length of the formula. Each clause increases the size of the instance by a constant amount. Then rewriting each clause also requires a constant amount of time and space: in the very worst case, we get a formula of eight clauses in DNF per clause we start out with.

In total, performing this transformation only increases the size of the formula linearly in the length of the original formula.

Hamiltonian Cycle

Now, we'll prove something about a simple, but important graph problem. The following notion was due to the Irish mathematician William Rowan Hamilton in the early 1800s.

Let $G$ be a graph. A cycle is a Hamiltonian cycle if every vertex of $G$ is in the cycle. A path is a Hamiltonian path if every vertex of $G$ is in the path.

The following graph contains a Hamiltonian cycle that is highlighted.

Hamiltonian cycles and paths have some very interesting applications. One is a connection to Gray codes, which are a binary code on $n$ digits that are defined such that the successor of a number only has a one digit difference from the current number.

The obvious question that we can ask about Hamiltonian cycles is whether our graph contains one.

The Hamiltonian Cycle problem is: Given a graph $G$, does $G$ contain a Hamiltonian cycle?

There is are directed and undirected flavours of this problem. Let's start with a simple reduction.

Directed Hamiltonian Cycle $\leq_P$ Undirected Hamiltonian Cycle.

We'll describe the reduction informally. The big question is how to transform a directed graph into an undirected graph while maintaining the "directedness" of the edges. What we can do is split a vertex $v$ into three: $v_{in}, v, v_{out}$.

We join $v_{in}$ and $v$ by an edge and $v$ and $v_{out}$ by an edge. Then for all edges $(u,v)$, we join $u_{out}$ and $v_{in}$. It remains to show that there is a Hamiltonian cycle in one graph if and only if there is one in the other.

That was not so bad. Now, let's do something a bit more surprising: we'll show that 3-SAT reduces to Hamiltonian Cycle.

3-SAT $\leq_P$ Hamiltonian Cycle.

Let $\varphi$ be an instance of 3-SAT with $n$ variables and $k$ clauses. We will construct an instance of Hamiltonian Cycle. What we will do is try to construct a graph so that each possible assignment of variables corresponds to a Hamiltonian cycle. We then modify the graph by adding paths through clauses in such a way that we'll have a Hamiltonian cycle for those assignments that are satisfying.

The idea is that we need a way to tell whether we're setting a variable $x_i$ to True or False. So we create a series of vertices so that a path goes through the vertices from left to right if we set $x_i$ to True and we set it to False if the path goes the other way.

This gives us a way to define Hamiltonian paths through our graph that correspond to a particular assignment of variables. We then need to use this to indicate whether a clause has been satisfied by a particular assignment of variables it contains. We do this by hooking up vertices representing clauses to each of the paths for the corresponding variables.

If a literal $x_i$ is in a clause $C_j$, then the edges are such that $C_j$ is entered via the left from row $i$. Similarly, if the literal $\neg x_i$ is in a clause $C_k$, the edges are such that $C_k$ is entered via the right on row $i$.

These clause vertices are connected to the variable paths according to the literal. In order to space out these paths, we need $3k+3$ vertices in each variable path.

We need to argue that there is a satisfying assignment for $\varphi$ if and only if this graph $G$ we constructed has a Hamiltonian cycle.

Suppose $\varphi$ has a satisfying assignment. We can define our Hamiltonian cycle $\mathcal C$ by the following. If variable $x_i$ has been set to True, then the path moves through row $i$ from left to right; if $x_i$ is set to False, the path moves through row $i$ from right to left. Then for each clause $C_j$, there is at least one row for which the cycle can visit $C_j$. This only needs to happen once per clause, so the path only goes through $C_j$ once.

Now, suppose that $G$ has a Hamiltonian cycle. We need to recover an assignment for $\varphi$. For each row $i$, if the cycle goes from left to right, then $x_i$ is set to True; if it goes from right to left, we set $x_i$ to False. This assignment is satisfying because each clause vertex is visited by the Hamiltonian cycle. This vertex can only be visited by entering and exiting via vertices on the same row. This row corresponds to the literal that satisfies the clause. Since every clause is satisfied, $\varphi$ is satisfied.

It remains to show that this construction can be performed in polynomial time. We observe that the graph $G$ has $n(3k+3) + 2$ vertices, so the size of the graph is polynomial in the length of $\varphi$.