As in past years, here's a roundup of what's been happening in the data structures and algorithms (cs.DS) section of arXiv.org. Over the last year, there were 1340 new algorithms preprints on arXiv; that's up about 13% from 1182 in 2014. The explosive growth rate of this section has been gradually diminishing over the years, but this time it didn't: it's higher than 2014's 10% growth rate. Statistics on arXiv submissions more generally are also available.

I picked out the following ten papers (listed chronologically) as being particularly interesting to me. I'm not going to claim that they're in any objective sense the best of the year. Nevertheless I hope they're interesting to others as well.

  • "A \( (1+\epsilon) \)-embedding of low highway dimension graphs into bounded treewidth graphs", arXiv:1502.04588, by Feldmann, Fung, Könemann, and Post, ICALP 2015. The highway dimension of a graph models a natural property of road networks according to which, if you go far enough from some starting point, there are only a few different ways that your initial path can go. For instance, I used to live in Santa Barbara, where there are only three ways out of town: east or west along the coast on Highway 101, or north over the mountains on San Marcos Pass Road. (One rainy year, all three were blocked simultaneously.) The same phenomenon happens both on smaller and larger scales. This paper connects this theory with deep results in metric embedding and graph structure theory, allowing many more graph problems to be approximated efficiently on low-highway-dimension graphs. It appears closely related to Feldmann's second ICALP 2015 paper which used metric embeddings as part of approximation algorithms for graphs of low highway dimension.

  • "Clustered integer 3SUM via additive combinatorics", arXiv:1502.05204, by Chan and Lewenstein, STOC 2015. The 3SUM problem is the following: you're given a collection of numbers and you want to test whether some triple of them sums to zero. A naive algorithm would take cubic time but with a little care it can be solved in quadratic time instead; for a long time that was believed to be optimal, and this assumption was used to show lower bounds on many other algorithmic problems. The quadratic time bound was broken recently by Williams at STOC'14, but the improvement was in a lower-order term, not the quadratic main exponent of the time bound. This paper gives a bigger break, with exponents bounded below two, although only for some special cases: small integer values, or subsets of a preprocessed set of integers.

  • "Faster 64-bit universal hashing using carry-less multiplications", arXiv:1503.03465, by Lemire and Kaser. I already posted briefly about this, but the basic idea is that modern CPUs now include instructions for doing arithmetic in GF2[x] (the ring of polynomials over the binary field), and this can be used to make a high-quality hash function very fast.

  • "If the current clique algorithms are optimal, so is Valiant's parser", arXiv:1504.01431, by Abboud, Backurs, and Williams, FOCS 2015. Clique-finding and context-free grammar parsing are both among the problems that can be sped up using fast matrix multiplication: a \( k \)-clique in an \( n \)-vertex graph can be found in time \( O(n^{k\omega/3}), \) and a grammar of size \( g \) with an \( n \)-symbol input string can be parsed in time \( O(n^{\omega}), \) where ω is the exponent of fast matrix multiplication. Now this paper shows that the two problems are related: any additional speedup of grammar parsing would also speed up clique-finding, even for grammars of constant size. This is a big improvement on previous lower bounds for context-free parsing which were conditional and required non-constant grammars.

  • "A faster pseudopolynomial time algorithm for subset sum", arXiv:1507.02318, by Koiliaris and Xu. The textbook algorithm for subset sum takes time \( O(nK) \) where \( n \) is the number of input items (assumed to be positive integers) and \( K \) is the sum to be achieved. This paper reduces the dependence on \( n \) to the square root.

  • "Tight bounds for subgraph isomorphism and graph homomorphism", arXiv:1507.03738, by Fomin, Golovnev, Kulikov, and Mihajlin, to appear next week at SODA 2016. Subgraph isomorphism is the problem of finding one graph as a subgraph of another. It includes as special cases \( \mathsf{NP} \)-hard problems such as finding cliques or Hamiltonian cycles, but when the subgraph to be found has a small number \( k \) of vertices, it can be solved in time \( n^{O(k)}. \) This paper proves a lower bound of the same form, conditional on the exponential time hypothesis, the assumption that there is no subexponential algorithm for Boolean satisfiability.

  • "The 4/3 additive spanner exponent is tight", arXiv:1511.00700, by Abboud and Godwin. This is part of a line of research on approximating distances in arbitrary unweighted graphs by sparse graphs, so accurately that the error is only additive rather than multiplicative. It seemed that there was a tradeoff between the number of edges in the sparse graph and the accuracy of approximation: you could decrease the exponent of the number of edges (relative to the number of vertices) at the expense of a bigger addiitive error. But the best result of this type known was that with error at most 6 you could get a spanner with only \( O(n^{4/3}) \) edges. Now it seems that the tradeoff stops here: fewer edges will necessarily cause an additive error that grows as a power of \( n \) rather than staying constant.

  • "Which regular expression patterns are hard to match?", arXiv:1511.07070, by Backurs and Indyk. The obvious answer is "none of them" because regular-expression matching has a low polynomial time bound. But it's quadratic (the product of the expression length and the input length), while some special cases such as matching a collection of dictionary words can be solved much more quickly (e.g. by building a DFA and then running it). This paper proves a strong dichotomy (assuming the exponential time hypothesis) between expressions that require near-quadratic time and expressions that take only near-linear time.

  • "Optimal dynamic strings", arXiv:1511.02612, by Gawrychowski, Karczmarz, Kociumaka, Łącki, and Sankowski. The usual ways of representing strings take time linear in the string length to form new strings by concatenating or splitting the existing ones, and linear time to compare two strings or find the first position at which they differ. This paper gives a data structure for the same operations that takes logarithmic time per update and constant time per query.

  • "Graph isomorphism in quasipolynomial time", arXiv:1512.03547, by Babai. So much has already been written about this one. Need I say more? But there is more: I've seen some statistics on most-downloaded papers on the arXiv (too rough to link to), and this is the cs.DS representative on the list, with tens of thousands of accesses. So obviously it's getting read far beyond its own specialized research community.





Comments:

None: CV
2016-01-06T12:16:29Z
Some Computer Vision and Machine Learning conferences require submission to be put on arxiv and available for public commenting for improvement and the authors are expected to address the public comments and improve their paper before the final decision about them are made. So essentially every paper ends up on arxiv and gets discussed and improved before the conference. At conference they have a few main talks and a small number of 20min talks and some poster spotlights where authors of interesting posters do a sales speech for their poster in less than 5min. People go to the conference not for papers but mainly for networking. Things that theory might also want to experiment with.
11011110: RE: CV
2016-01-06T20:10:59Z
A larger number of the heavily-downloaded arXiv papers were in cs.CV and cs.LG — what you describe may have something to do with why.
ext_3462100: The Subgraph Isomorphism paper
2016-01-06T20:35:38Z
The Subgraph Isomorphism paper of Fomin, Golovnev, Kulikov, and Mihajlin got merged at SODA with a paper of Cygan, Pachocki, and Socała, which is also on arxiv: http://arxiv.org/abs/1504.02876 . As far as I know, the two groups worked independently and there is an alternating sequence of preprints on arxiv that improve one another. The story eventually ended in a merge at SODA that settles the complexity once and for all.
11011110: RE: The Subgraph Isomorphism paper
2016-01-06T21:18:17Z
Interesting — thanks for the info.
ext_3462162: A faster pseudopolynomial time algorithm for subset sum
2016-01-06T21:32:35Z
Glad to read that our work caught your eye David! I just wanted to say we are currently working on an updated draft with cleaner writing, simpler algorithm presentation and easier analysis.