Rapid mixing for 3-colorings

Remember my TCS exchange question on whether 3-colorings of cycle graphs rapidly mix? Well, it turns out that they do. I got the following solution from Dana Randall (who has otherwise been hard at work finishing off the selection of SODA papers) so any good ideas in it are hers and any mistakes were most likely introduced by me in writing all this up.

To restate the question again: we're trying to generate random 3-colorings of a cycle graph, by taking a random walk in a big state space of different 3-colorings. If we walk long enough, we'll eventually become so lost that the coloring is in a uniformly random state (or so close to uniform as makes no difference). The question is, how long is long enough? We'd like it to be polynomial in the length of the cycle, rather than polynomial in the number of different colorings, because the length of the cycle is exponentially smaller than the number of colorings. If so, then the random walk is said to be "rapidly mixing".

The short version of Dana's solution is: instead of using the Markov chain I described, in which potentially-large two-colored regions are recolored, use a "heat bath" in which we repeatedly remove the colors of two vertices and then choose a valid coloring for them at random. It's not hard to show that, if this chain mixes, then the other one does as well. But a standard path coupling argument turns out to work to show that the heat bath does indeed mix.

Using a different Markov chain:

First, the easy part. The heat bath is almost the same as the modified Wang–Swendsen–Kotecký dynamics I described here, in which we only change the colors of a 2-colored region in the cases when it consists of one or two vertices; larger regions stay fixed. The two Markov chains both have almost the same set of transitions: there's one case of a step in the heat bath that takes two WSK steps to perform, but that's not a big problem, and the transition probabilities are different, but only by constant factors, so that's also not a problem. And adding the full set of Wang–Swendsen–Kotecký moves doesn't hurt anything, so if the heat bath mixes well then so does the Wang–Swendsen–Kotecký dynamics.

Coupling:

The basic idea of coupling is to run two random walks in parallel: one, starting from a nonrandom state, is the one you want to prove mixes quickly, while the other one starts from a random state and therefore stays random at each step. However, the two random walks are not independent of each other, but rather have dependencies that are carefully chosen in order to make them become more similar over time. Eventually, if you perform the two coupled random walks long enough (where "long enough" has to be defined independently of the actual states of the two walks) they will be so close that the initially-nonrandom one has nearly the same uniform distribution as the initially random one. (There's also a related technique called coupling from the past that performs the two walks backwards rather than forwards, and has a stopping rule that is not independent of the states of the chains, but still somehow can be guaranteed to generate uniformly random samples).

Coupling at Hamming distance one:

As a warm-up to the full coupling argument, suppose we have two 3-colorings \( K \) and \( L \) of the \( n \)-cycle that only differ from each other in the coloring of a single vertex. We want to define a way of performing a heat bath step on both colorings simultaneously, in such a way that they are likely to become closer together. To do so, we pick a vertex \( v \) of the cycle uniformly at random, and look at the chain of four consecutive vertices \( u \)-\( v \)-\( w \)-\( x \). We'll uncolor and then recolor the same two vertices \( v \) and \( w \) in both \( K \) and \( L \), but we may not be able to do so in such a way that they both end up with the same color. There are four cases:

If the position at which \( K \) and \( L \) differ is outside the chain \( u \)-\( v \)-\( w \)-\( x \), then \( u \) has the same color in both \( K \) and \( L \), as does x. In this case, we may choose a random coloring for \( v \) and \( w \) consistent with the colors of \( u \) and \( x \), and use the same new coloring in both \( K \) and \( L \). The distance between the two colorings remains unchanged by a step of this type, and (when we look at either \( K \) or \( L \) alone, ignoring what happens in the other coloring) this step correctly reproduces the transition probabilities of the heat bath on a single coloring. It happens with probability \( (n-4)/n \).
If the position at which \( K \) and \( L \) differ is at \( v \) or \( w \), then again we may choose a single random coloring for \( v \) and \( w \) and use the same colors in both \( K \) and \( L \). This type of step happens with probability \( 2/n \), and whenever it happens the distance between the two colorings decreases by one.
In the third case, the position at which \( K \) and \( L \) differs is at \( u \) or \( x \). In each of \( K \) and \( L \), there may be either two or three possible new colorings of \( v \) and \( w \) to choose from, but it is not always possible to use the same recoloring for both \( K \) and \( L \). There are two subcases, up to symmetries (reversing the four-vertex chain, swapping the roles of \( K \) and \( L \), and permuting the colors):
1. \( u \)-\( v \)-\( w \)-\( x \) are colored R-B-R-G in \( K \), and colored R-B-R-B in \( L \). With probability \( 1/3 \), we choose the same colorings again, with probability \( 1/3 \) we choose the colorings R-G-R-G / R-G-R-B, and with probability \( 1/3 \) we choose the colorings R-G-B-G / R-B-G-B. So, if this subcase happens, then with conditional probability \( 2/3 \) the distance stays the same and with conditional probability \( 1/3 \) it increases by two units. After a step like this the expected increase in distance is
  \[ \frac{1}{3}\times 0+\frac{1}{3}\times 0+\frac{1}{3}\times 2 = \frac{2}{3}. \]
2. \( u \)-\( v \)-\( w \)-\( x \) are colored R-B-G-R in \( K \), and colored R-B-G-B in \( L \). We choose the same colorings R-B-G-R / R-B-G-B with probability 1/3, the colorings R-G-B-R / R-G-R-B with probability \( 1/3 \), the colorings R-B-G-R / R-B-R-B with probability \( 1/6 \), and the colorings R-G-B-R / R-B-R-B with probability \( 1/6 \). After a step like this the expected increase in distance is
  \[ \frac{1}{3}\times 0+\frac{1}{3}\times 1+\frac{1}{6}\times 1+\frac{1}{6}\times 2 = \frac{5}{6}. \]

We can't control which of the two subcases of case three happens, but we know that case 3 happens with probability \( 2/n \), and that whenever it happens the expected distance between the two colorings increases by at most \( 5/6 \). Therefore, putting all of these cases together, when two colorings are at distance one we can couple them in such a way that the expected distance after a random walk step is at most \[ \frac{n-4}{n}\times 1+\frac{2}{n}\times 0+\frac{2}{n}\times\frac{11}{6} = 1 − \frac{1}{3n}, \] a tiny bit closer together in expectation.

Coupling at Hamming distance two:

Ok, what about when the two colorings \( K \) and \( L \) are farther apart than that? The next simpler case is when \( K \) and \( L \) differ at exactly two positions, and those two positions are consecutive.

With probability \( (n-5)/n \) the positions \( v \) and \( w \) that we choose are far from the positions where the sequences differ, and the update doesn't cause the distance to change.

With probability \( 1/n \) we get an update in which \( v \) and \( w \) are exactly the two positions where the sequences differ. In the recoloring step, the colorings of \( K \) and \( L \) can be chosen the same as each other, reducing the distance by two.

With probability \( 2/n \) we get an update in which one of \( u \) and \( x \) has a different coloring in \( K \) than in \( L \), and the other one is the same in both colorings. This is exactly the same as the analysis for Hamming distance one, case 3, in which the expected distance might increase by as much as \( 5/6 \).

With probability \( 2/n \) we get an update in which \( u \) and \( v \) (or symmetrically \( w \) and \( x \)) are the two positions at which \( K \) and \( L \) differ. But this case also has exactly the same analysis as case 3 of the Hamming distance one case, except that we've already started out with one more unit of distance. So the next effect is to decrease the expected distance by \( 5/6 \) or better.

Thus, the expected distance after a random walk step is at most \( 2(1-1/4n) \), again a tiny bit closer together than \( K \) and \( L \) were before the step.

Path coupling:

Path coupling is a technique for analyzing Markov chains that was introduced by Bubley and Dyer at FOCS 1997. It allows us to look only at simple cases such as the ones above and let the messier cases take care of themselves, by performing simple coupling steps simultaneously on all edges of a shortest path in state space.

When we start out our two coupled random walks on colorings \( K \) and \( L \), they're not very likely to be as close together as in the case analysis above. But in general (because the state space is connected by heat bath moves) it will be possible to find a path in the space of all 3-colorings of the cycle, having \( K \) and \( L \) as endpoints of the path and having valid colorings in between them, with the colorings in consecutive states along the path differing either in the color of a single vertex or in the colors of two consecutive vertices. Among all such paths, we choose (arbitrarily in case of ties) a shortest one, measuring the length of an edge between two adjacent colorings as the Hamming distance of the two colorings.

We imagine all of the colorings along this shortest path as simultaneously performing heat bath steps, in such a way that consecutive pairs of colorings along the path are coupled exactly as in the two cases above. There's some ambiguity in how to jointly couple all of the colorings' random walks in order to achieve this pairwise coupling, but it doesn't matter how this ambiguity is resolved. We can then perform the heat bath steps on \( K \) and \( L \), with probabilities determined by the way that they are pairwise coupled by this big messy joint coupling.

After a coupled step of this type, the expected distance between any two adjacent pairs of the path will go down by a factor of \( (1-1/4n) \) or better. Therefore, the expected distance of the walk in state space from \( K \) to \( L \) formed by concatenating a sequence shortest paths between the transformed versions of each of the colorings along the path is at most the old distance between \( K \) and \( L \) times this same \( (1-1/4n) \) factor. This walk may not be a shortest path itself, but the fact that it exists and has short expected length nevertheless implies that the new expected distance between \( K \) and \( L \) is at most the old expected distance times \( (1-1/4n) \).

In order to complete the coupling argument, we'll perform this same coupling step many times. At each step, the way that \( K \) and \( L \) are coupled will depend only on their joint state, so this is a Markov process, and if we look at either \( K \) and \( L \) alone we get exactly the heat bath dynamics.

Rapid mixing:

A standard definition for the mixing time of a Markov process is the time it takes to achieve total variation distance at most \( \varepsilon \) from uniform, for some \( \varepsilon \). In this case, the analysis is easy: we simply perform \( 4n\ln(n/\varepsilon) \) steps of the heat bath.

Initially, \( K \) is nonrandom and \( L \) is uniformly random; \( K \) is at most \( n \) steps away from \( L \) in Hamming distance. After each step, the expected distance decreases by at least a \( (1-1/4n) \) factor, so after the given number of steps, the expected distance will be at most \( (1-1/4n) ^{4n\ln(n/\varepsilon)}\le\varepsilon \). And by Markov's inequality, this means that with probability at least \( 1-\varepsilon \), \( K \) and \( L \) now have the same coloring, and with probability at most \( \varepsilon \) their colorings differ. But \( L \) is still uniformly random, so the probability distribution on \( K \) is a mixture of \( 1-\varepsilon \) probability of being uniformly random and \( \varepsilon \) probability of something else, which is essentially the definition of being within total variation distance \( \varepsilon \) of uniform.

That is: the mixing time of the heat bath is \( O(n\log n) \). The full Wang–Swendsen–Kotecký dynamics includes steps that perform the same recolorings as the heat bath, with probabilities that are within a constant factor the same as the heat bath probabilities, as well as some other steps that don't hurt anything, and it has the same uniform limiting distribution as the heat bath, so its mixing time is also \( O(n\log n) \).