A brief introduction to the logic of graphs

If you're used to writing mathematics, but haven't paid much attention to model theory, you probably think a fully-quantified mathematical sentence is generally either true or false. Fermat's last theorem, for instance, can be written as the following sentence: For all positive integers \( a \), \( b \), \( c \), and \( n \), if \( n\gt 2 \), then \( a^n+b^n\ne c^n \). This sentence is fully quantified: the four variables \( a \), \( b \), \( c \), and \( n \) are all covered by the quantifier "for all positive integers". It's one of the true ones, if difficult to prove.

But when we're working with the logic of graphs, a (fully-quantified) sentence is itself just another mathematical object, and its truth is relative: it might be true for some graphs and false for others. Consider, for instance, the following sentence about an undirected graph: "There exist vertices \( v \) and \( w \) such that \( v \) and \( w \) are adjacent, and for all vertices \( u \), if \( u \) and \( v \) are adjacent, then \( u \) equals \( w \)." It can be satisfied only when \( v \) is a vertex whose degree is exactly one, and \( w \) is its unique neighbor. We can write this more concisely using a notation in which adjacency is written using a tilde:

\[ \exists v,w: (v\sim w)\wedge (\forall u: (u\sim v) \rightarrow (u = w)) \]

Let's give this sentence a name, \( D_1 \). Then \( D_1 \) is true for a graph that has a degree-one vertex, such as the complete bipartite graph \( K_{1,4} \). But it's false for a graph that doesn't have such a vertex, such as the complete graph \( K_4 \). If a sentence is true for a graph, we say that the graph "models" the sentence, and we can also write that in mathematical notation:

\[ K_{1,4}\models D_{1} \]

This kind of logic, in which the only things we can talk about are vertices and their adjacencies, is called the first order logic of graphs, and it's kind of weak. Each of its sentences is equivalent to an algorithm that can contain nested loops over vertices, if-then-else logic involving adjacency tests and equality, and the ability to return Boolean values, but nothing else. For instance:

def d1(G):
   for v in G:
       for w in G:
           if G.adjacent(v,w):
               for u in G:
                   if G.adjacent(u,v):
                       if u != w:
                           break
               else:
                   return True
    return False

This is good enough to recognize some families of graphs (such as the ones with a finite set of forbidden induced subgraphs) but not many others. For instance, I don't know how to describe the distance-hereditary graphs in this way. They can be described by forbidden induced subgraphs, but infinitely many of them, and we're not allowed to write infinitely-long sentences.

On the other hand, the weakness of first-order logic means that we can prove interesting facts about it. For instance, every first-order sentence defines a family of graphs that can be recognized in polynomial time. Also, we have the 0-1 law: if \( S \) is any sentence in first-order logic then the probability that a graph chosen uniformly at random among all \( n \)-vertex graphs models \( S \) is either zero or one in the limit as \( n \) goes to infinity. Using the 0-1 law, even though we can't describe the distance-hereditary graphs precisely in first-order logic, we can get an approximate description that's good enough to prove that almost all random graphs are not distance-hereditary. A distance-hereditary graph either has a degree-one vertex (it models \( D_1 \)) or it has two twin vertices, vertices whose sets of neighbors (not counting the two twins themselves) are identical. The existence of twins can also be described by a first-order sentence \( T \):

\[ \exists u,v: (u\ne v)\wedge\forall w: (u=w)\vee(v=w)\vee((u\sim w)\leftrightarrow(v\sim w)) \]

But for a uniformly random graph, both the expected number of degree-one vertices and the expected number of twin pairs, can be calculated directly from these formulas, and are exponentially small in the number \( n \) of vertices. So almost all graphs do not model \( D_1 \), do not model \( T \), and therefore cannot be distance-hereditary.

The name "first order" should be a hint that there's another kind of logic, "second order logic", and there is. In second order logic, the variables can represent complicated structures built out of \( k \)-ary relations (for instance, entire graphs), the quantifiers quantify over these structures, and we need more primitives to be able to look at what's inside these structures. The idea of using second order logic seems to be somewhat controversial in mathematics, in part because there's not a uniquely-defined way of assigning meanings to statements in this logic, but there's a restricted subset of the second order logic of graphs, called monadic second order logic, where these problems do not arise. Or actually there are two such subsets: MSO₁ and MSO₂.

MSO₁ is what you get from the first order logic described above when you add another type of variable for sets of vertices (usually written with capital letters) and you allow quantification over sets of vertices. The only other feature beyond first order logic that's necessary to define this logic is the ability to test whether a vertex belongs to a set. It's convenient to write formulas using more complicated tests such as whether one set is a subset of another, but those can be broken down into membership tests. We can also get the effect of using these sets as variable types that can be quantified over, by instead quantifying over all vertices but then testing whether the results of the quantification belong to the given set. For instance we can write sentences \( D_1[X] \) and \( T[X] \) that have the same form as \( D_1 \) and \( T \), but restrict all their variables to be in \( X \). The effect of this restriction would be to test whether the subgraph induced by \( X \) has a degree-one vertex or has twins. A distance-hereditary graph is a graph in which every induced subgraph of two or more vertices has a degree-zero vertex, a degree-one vertex or twins, and this logic allows us to express this definition in a sentence DH:

\[ \forall X: (\exists u,v: u\in X\wedge v\in X\wedge u\ne v)\rightarrow(D_0[X]\vee D_1[X]\vee T[X]) \]

A graph \( G \) models DH if and only if \( G \) is distance-hereditary. MSO₂ is similar, but allows four types of variables: vertices, edges, and sets of vertices and edges. The ability to represent sets of edges allows it to express some properties (such as the property of having a Hamiltonian cycle) that cannot be expressed in MSO₁.

Unlike first-order logic, we don't necessarily get efficient algorithms out of MSO expressions. Simulating the formula directly would involve an exponential-time search over all possible subsets of vertices or edges in a given graph. But that's not the only way to turn one of these formulas into an algorithm. In particular, we can apply Courcelle's theorem, which says that every MSO₂ formula can be translated into a fixed-parameter tractable algorithm on graphs parameterized by their treewidth, and that every MSO₁ formula can be translated into an FPT algorithm on graphs parameterized by their clique-width. In the example of the distance-hereditary graphs, we also know that all such graphs have bounded clique-width. So applying Courcelle and plugging in the fixed bound on the clique-width of these graphs immediately tells us that there's a polynomial time algorithm for recognizing distance-hereditary graphs.

All of this is, I think, completely constructive: it's not just that an algorithm exists, but in principle we could automatically translate the formula into the algorithm. It's also completely useless in practice because the dependence on the parameter is ridiculously high (some kind of tower of powers). When an algorithm is found in this way, additional research is needed to find a more direct algorithm that reduces this dependence to something more reasonable like single-exponential with a small base, or even removes it to get a polynomial time algorithm. In the case of the distance-hereditary graphs, there's an easy polynomial algorithm: look for degree one vertices or twins, and whenever one of these patterns is found use it to reduce the size of the graph by one vertex. With a little more care one can even achieve linear time for distance-hereditary graph recognition.

My latest preprint, "Crossing minimization for 1-page and 2-page drawings of graphs with bounded treewidth" (arXiv:1408.6321, with Michael Bannister, to appear at Graph Drawing), uses this same logical approach to attack some problems related to book embedding. We had a paper with Joe Simons in GD 2013 that showed that, for graphs formed from trees by adding a very small number of edges, we can find 1-page and 2-page book drawings with a minimum number of crossings in FPT time. In the new paper, we characterize these drawings using second order logic and apply Courcelle's theorem, allowing us to generalize these algorithms to the graphs of low treewidth, a much larger class of inputs. But because we use Courcelle's theorem, our algorithms are completely impractical. More research is needed to find a way to minimize crossings in practice.

Comments:

None:
2014-08-28T07:07:40Z
I think the python code is slightly wrong: in the innermost "return False" you should conclude only that the pair (v,w) is wrong and go back to the next iteration of the w-loop.

11011110:
2014-08-28T07:33:04Z
Oops, fixed with a for...else. Thanks.

None:
2014-08-28T11:16:37Z
There is actually an implementation of MSO on graphs of bounded treewidth; apparently, it works quite okay for treewidth up to 10 or so with simple formulas.

11011110:
2014-08-28T16:13:12Z
Thanks for the pointer. Perhaps I should have said that the time bounds we get are completely impractical. I guess that's different than the algorithms themselves.

None:
2014-08-28T22:51:58Z
http://arxiv.org/pdf/1308.3654v2.pdf contains some useful information related to this topic

11011110:
2014-08-28T23:30:40Z
Thanks for the pointer. The link is "Connection Matrices and the Definability of Graph Parameters" (arXiv:1308.3654, Tomer Kotek and Johann A. Makowsky) and it appears to be about methods for proving that certain graph properties are not definable in certain graph logics.