The linear algebra of edge sets of graphs

This quarter, in my advanced algorithms class, I've been going through Vazirani's Approximation Algorithms book chapter-by-chapter, and learning lots of interesting material that I didn't already know myself in the process.

One of the things I recently learned (in covering chapter 6 on feedback vertex set approximation)^* is that, although all the students have taken some form of linear algebra, many of them have never seen a vector space in which the base field is not the real numbers or in which the elements of the vector space are not tuples of real coordinates. So instead of discussing the details of that algorithm I ended up spending much of the lecture reviewing the theory of binary vector spaces. These are very important in algebraic graph theory, so I thought it might be helpful to write a very gentle introduction to this material here.

First of all we need the concept of a field. This is just a system of elements in which we can perform the usual arithmetic operations (addition, subtraction, multiplication, and division) and expect them to behave like the familiar real number arithmetic: addition and multiplication are associative and commutative, there are special values 0 and 1 that are the identities for addition and multiplication respectively, subtraction is inverse to addition, division is inverse to multiplication by anything other than zero, and multiplication distributes over addition. The field that's important for this material is a particularly simple one, GF(2), in which the required special values 0 and 1 are the only elements. The arithmetic of these two values can be described as ordinary integer arithmetic mod 2, or equivalently it can be described by saying that addition is Boolean xor and multiplication is Boolean and. Subtraction turns out to be the same as addition, and division by 1 (the only value that it's possible to divide by) is just the identity operation. It's not hard to verify that these operations have all the desired properties of a field, and doing so maybe makes a useful exercise (Exercise 1).

Next, a vector space is a collection of elements that can be added to each other and multiplied by scalars from a field. (One can generalize the same concept to other kinds of arithmetic than fields but then one gets modules instead of vector spaces.) The vector addition operation must be commutative and invertible; this implies that it has an identity element, and this element (whatever it happens to be) is called the zero vector. Additionally, scalar-scalar-vector multiplications must be associative, scalar multiplication by the special element 1 of the field must be the identity operation, and scalar multiplication must be distributive over both vector and field addition.

One easy way to construct vector spaces over a field \( \mathbb{F} \) is to make its elements be \( k \)-tuples of elements of \( \mathbb{F} \) with the addition and scalar multiplication operations acting independently on each coordinate, but it's not the only way. For the vector spaces used in this chapter, a different construction is more natural: we let the elements of the vector space be sets in some family of sets, and the vector addition operation be the symmetric difference of sets. The symmetric difference \( S \bigtriangleup T \) of two sets \( S \) and \( T \) is the set of elements that occur in one but not both of \( S \) and \( T. \) This operation is associative, commutative, and invertible, where the inverse of a set is the same set itself: \( S \bigtriangleup T \bigtriangleup T = S \) regardless of which order you use to perform the symmetric difference operations. If a nonempty family of sets has the property that the symmetric difference of every two sets in the family stays in the family, then these sets can be interpreted as the elements of a vector space over GF(2) in which the vector addition operation is symmetric difference, the zero vector is the empty set (necessarily in the family because it's the symmetric difference of any other set with itself), scalar multiplication by 0 takes every set to the empty set, and scalar multiplication by 1 takes every set to itself. One has to verify that these addition and multiplication operations are distributive, but again this is a not-very-difficult exercise (Exercise 2).

As with other kinds of vector spaces, these vector spaces of sets have bases, collections of vectors such that everything in the vector space has a unique representation as a sum of scalar products of basis vectors. Every two bases have the same number of vectors as each other (Exercise 3: prove this), and this number is called the dimension of the vector space. If the dimension is \( d, \) the total number of vectors in the vector space is always exactly \( 2^d, \) because that is the number of different ways that you can choose a scalar multiple (0 or 1) for each basis vector.

The families of sets that are needed for this chapter are subsets of edges of a given undirected graph. These can also be interpreted as subgraphs of the graph, but they're not quite the same because the usual definition of a subgraph also allows you to specify a subset of the vertices (as long as all edges in the edge subset have endpoints in the vertex subset), and we won't be doing that. Every graph has three important vector spaces of this type associated with it, the edge space, the cycle space, and the cut space. The edge space is the family of all subsets of edges (including the set of all edges of the given graph and the empty set). That is, it is the power set of the set of all edges; it has a natural basis in which the basis vectors are the one-edge sets, and its dimension is the number of edges in the graph.

The cycle space is the family of all subsets of edges that have even degree at all of the vertices of the graph (Exercise 4: prove that this family is closed under symmetric difference operations). So it includes the simple cycles of the graph, but it also includes other subgraphs; for instance in the graph of an octahedron (a six-vertex graph with four edges at each vertex) the set of all edges is in the cycle space, as are the sets of edges formed by pairs of triangles that touch each other at a single vertex and the sets complementary to triangles or 4-cycles. It's always possible to find a basis for the cycle space in which the basis elements are themselves simple cycles; such a basis is called a cycle basis. For instance you can form a "fundamental cycle basis" by choosing a spanning forest of the given graph and then finding all cycles that have one edge \( e \) outside this forest and that include also the edges of the unique path in the forest that connects the endpoints of \( e. \) Or, for a planar graph, you can form a cycle basis by choosing one cycle for each bounded face of a planar embedding of the graph. There are lots of interesting algorithmic problems associated with the cycle space and its cycle bases, but for this chapter the main thing that's needed is to compute its dimension, which has the nice formula \( |E|-|V|+c, \) where \( E \) is the edge set of the given graph, \( V \) is the vertex set, and \( c \) is the number of connected components. One name for this dimension is the cyclomatic number of the graph, and the book chapter denotes it as \( \operatorname{cyc}(G). \) (It's also possible to interpret it topologically as the first Betti number of the graph but for students who don't already know about binary vector spaces that would probably be more confusing than helpful.)

The cut space of the graph doesn't take part in this chapter, but can be defined similarly as the set of all cut-sets of the graph. A cut of a graph is a partition of its vertices into two disjoint subsets; in some contexts we require the subsets to both be nonempty but we don't do that here, so the partition into an empty set and the set of all vertices is one of the allowed cuts. The corresponding cut-set is the set of edges that have one endpoint in each of the two subsets. The family of cut-sets is closed under symmetric difference (Exercise 5) so it forms a vector space, the edge space. If the edges are all given positive weights and the graph is connected, then the minimum weight basis of the edge space can be represented by a tree on the vertices of the graph, in which each tree edge determines a cut (the partition of the tree into two subtrees formed by deleting that edges) and has an associated number (the weight of its cut). This tree is called the Gomory–Hu tree of the graph and it came up (stripped of its linear-algebra origin) earlier, in an approximation for \( k \)-cuts in chapter 4. I also have a recent preprint on computing this basis and this tree for graphs that can be embedded onto low-genus surfaces: see arXiv:1411.7055.

^*Unrelatedly, in preparing to cover this topic, I was confused for a long time by a typo in this chapter. On page 56 it states that, for a minimal feedback set, "clearly" the sum over feedback vertices of the number of components formed by deleting that one vertex equals the number of feedback vertices plus the number of components that are formed by deleting the whole feedback set but that touch only one vertex in the set. This is not true. What is true, and what is needed for the later argument, is that the left hand side is greater than or equal to the right hand side.

Comments:

None:
2015-01-22T23:59:47Z
At this stage a gentle introduction to Matroid theory may be mentioned and interested students can look up the topic and read more.

11011110:
2015-01-23T00:05:43Z
Matroid theory is also relevant (especially in the algorithms for finding minimum weight bases for these spaces) but if I'd tried including that as well I wouldn't have had time for any of the approximation algorithms.

asher63:
2015-01-23T01:25:30Z
This is mostly way over my level, but it's interesting. I remember learning the definition of a field in cryptography, where they're useful for modular arithmetic and discrete logarithms. I'm taking beginning linear algebra now, so that aspect of it caught my eye as well.

11011110:
2015-01-23T08:15:37Z
Linear algebra over GF(2) is also important in crypto, for instance in the Gaussian elimination stages of the quadratic sieve and general number field sieve. In those cases the sets that you're treating as vectors are sets of prime factors that occur an odd number of times in a relation, and multiplying two relations corresponds to taking symmetric differences of these sets.

mathsandcrafts:
2015-01-23T07:47:04Z
It is nice to see that CS students can digest math. I admire the way you serve it too.

11011110:
2015-01-23T08:19:00Z
You say that as if you weren't once a CS student...anyway, thanks. And happy birthday Saturday!

mathsandcrafts:
2015-01-23T09:28:53Z
My master's was in maths. Thanks! CS students at TU/e needed a calculator to evaluate whether log x for a given x was above 1. So CS students who can do linear algebra seem almost SF.