Theory vs implementation in clique listing algorithms

The list of accepted papers to the Symposium on Experimental Algorithms is out, and it includes one of mine: Listing All Maximal Cliques in Large Sparse Real-World Graphs, with student co-author Darren Strash. Claire Mathieu recently asked whether FPT algorithms can lead to implementations; this paper provides an example, although I'm sure very far from the first one.

There's an old and standard algorithm for listing all maximal cliques, the Bron–Kerbosch algorithm, which works pretty well in practice. However, until recently the only theoretical bound on its running time was O(3^n/3). There are graphs that have this many cliques (the Moon–Moser graphs) so nothing better in terms of n alone can be proven, but this bound didn't really explain the algorithm's good performance in practice. So in a paper at ISAAC, Darren, Maarten Löffler and I had attempted to bridge this gap between theory and practice using FPT analysis techniques. Specifically, we showed that a simple variant of the algorithm (in which we used a degeneracy ordering of the input graph to order the recursive subproblems) could be made to run in the improved time bound O(dn3^d/3), where d is the degeneracy of the graph, a measure of its sparsity. What this means is that for large but sparse graphs, the algorithm takes linear time, and moreover that the dependence of the time bound on the sparseness of the graph is as tight as possible.

So then (as I wrote here) we thought that we should implement the algorithm, to see whether ordering the subproblems is actually useful in practice or whether it's just a trick that makes the analysis work. Another reason for implementing it was that we're part of a large project on the analysis of social networks, and our co-workers on the project needed a better implementation of a clique-finding algorithm than the one they had been using. The new SEA paper reports on our experiments using some very large real-world networks as test data.

The upshot is that the new algorithm does indeed work very well: it's not always better than a previous version of the Bron–Kerbosch algorithm that didn't use our subproblem ordering idea, but it's often better, always close, and sometimes far better. An additional advantage is that the new algorithm can work well with sparse representations of the input graph, whereas the older versions that we compared our algorithm against work best with dense graph representations that use a quadratic amount of memory, preventing them from being applied to very large graphs. Some of the largest graphs that we successfully tested the new algorithm on have millions of vertices and degeneracies ranging into the low three digits. In fact, this raises an issue, not unlike the one we started with: the running times of the implementation are far better than even our improved FPT analysis would predict, so again there's a big gap between theory and practice. Maybe we're not using the right parameter, and a tighter parameterized analysis with a different parameter would better show why these graphs are not so hard? Or maybe it's time to switch from worst-case to average case analysis, come up with a plausible but mathematically tractable model of random large sparse real-world graphs, and show that the expected time of the algorithm on such graphs is small?

One unanticipated outcome of writing a new implementation was that it uncovered several bugs in other existing implementations. Unfortunately this also meant that we had to badmouth some of our colleagues, after the SEA reviewers asked us to include another comparison of our code against someone else's software that we hadn't previously tried and we found that it didn't produce the correct results. But if this leads to fixes in code that others use, it's also a good thing in the long run.

The code is not yet public, but our plan is to include it in statnet, allowing it to be used within R.

Comments:

mcfnord:
2011-03-08T23:23:09Z
well go you!

mcfnord:
2011-03-08T23:24:57Z
is livejournal a large sparse graph?

11011110:
2011-03-15T00:43:48Z

I imagine so. It has millions of users and typical users have at most a few hundred friend links. That counts as sparse for our purposes. If you looked at the whole graph at once, it would be maybe ten times larger than the largest ones we tried, but not really any more dense, so probably it would only take ten times as much time to run.

(Sorry for the delayed response. I imagine it has something to do with this.)

mcfnord:
2011-03-15T00:48:37Z
let me know if you want it!

11011110:
2011-03-15T17:56:02Z
Thanks — my coauthor says he already has a copy. It was too big to fit into memory on the machine we were using on our tests but could probably be handled by a machine with more memory.