Happy New Year, everyone! It's time once again to give a status report on the cs.DS (data structures and algorithms) section of the arXiv. The arXiv as a whole just hit a big milestone, one million preprints uploaded. cs.DS forms a small fraction of that, but still, last year there were 1182 new preprints, up a little from the previous year.

There has been some talk recently about possible changes to the system, including replacing author choices of subarea within cs by the results of an automated text classification system (which already exists but is only used for advisory purposes now), and allowing moderators to reject papers that they deem to be unscientific or not of any plausible interest to the readers (as already happens in the physics and math parts of arXiv). I think any actual change is likely to happen only very slowly, but it's possibly worth thinking about the things arXiv does well and the other things that it might be able to do better.

With so many preprints, it's hard to choose among them (I guess that's what we have conference program committees for). Still, here's a selection of ten(-ish) I found personally interesting, excluding my own papers and a few I wrote about earlier. I'm sure I missed some other good ones, so feel free to leave your own favorites in the comments.

  • Popular conjectures imply strong lower bounds for dynamic problems, Amir Abboud and Virginia Vassilevska Williams, arXiv:1402.0054 and FOCS 2014. The exponential time hypothesis is the unproven but widely-believed conjecture that certain \( \mathsf{NP} \)-complete problems require exponential time. We already know how to scale it down, showing that (if ETH is true) certain known polynomial-time or fixed-parameter-tractable algorithms for static problems are optimally fast. This paper extends these results to scaled-down dynamic graph algorithms for basic problems such as reachability, showing that ETH explains the fact that we don't have subpolynomial update times for these problems.

  • Shortest paths in intersection graphs of unit disks, Sergio Cabello and Miha Jejčič, arXiv:1402.4855 and CGTA 2014. Unit disk graphs can have a quadratic number of edges, so algorithms whose running time is subquadratic have to use the geometric structure rather than just constructing the graph and using a general-purpose graph algorithm. This paper shows that unweighted shortest paths can be found in \( O(n\log n) \) time (essentially optimal) and that the weighted problem can be solved only a little slower.

  • The complexity of the simplex method, John Fearnley and Rahul Savani, arXiv:1404.0605. On simplex pivoting rules and complexity theory, Ilan Adler, Christos Papadimitriou, and Aviad Rubinstein, arXiv:1404.3320 and IPCO 2014. A long line of algorithms and discrete geometry research is rooted in the phenomenon that the simplex method can be used to find linear program solutions in practice, but in theory most variants of it can be forced to take an exponential number of steps. These two papers look at the solution trajectories found by this method, and show that (as well as being long) they have high computational complexity: it can be \( \mathsf{PSPACE} \)-complete to tell whether a point is part of the trajectory, or (for degenerate problems) which solution the simplex method will end up at.

  • Parameterized streaming algorithms for vertex cover, Rajesh Chitnis, Graham Cormode, MohammadTaghi Hajiaghayi, and Morteza Monemizadeh, arXiv:1405.0093 and SODA 2015. Streaming kernelization, Stefan Fafianie and Stefan Kratsch, arXiv:1405.1356 and MFCS 2014. Streaming meets parameterized complexity: many parameterized algorithms take linear time in their input size but exponential or worse time in some other parameter, so it makes sense to ask whether their linear time can scale even to problems too big to fit into main memory.

  • Flip distance is in FPT time \( O(n+k\cdot c^k) \), Iyad Kanj and Ge Xia, arXiv:1407.1525 and STACS 2015. The version of flip distance considered here is for triangulations of geometric point sets; a flip is a replacement of two triangles that form a convex quadrilateral by the other two triangles for the same quadrilateral, and flip distance is the minimum number of flips needed to change one triangulation to another. It's known to be \( \mathsf{NP} \)-hard for this variant, and even the special case of convex polygons is interesting (of unknown complexity). You might think that FPT is obvious: just keep the parts of the triangulation that are already correct, and flip the rest. But it's more complicated than that, because sometimes you might want to flip things that are already correct to get them out of the way, and then flip them back again later. (This makes a convenient point to repeat my standard warning that formulas in paper titles are a bad idea.)

  • Generating \( k \)-independent variables in constant time, Tobias Christiani and Rasmus Pagh, arXiv:1408.2157 and FOCS 2014. From the title, this looks like it might be about hash functions, but it's not. \( k \)-wise independence is an important assumption in that context, allowing practical hashing algorithms to be used with limited randomness, but there's a lower bound showing that computing such hash functions in constant time requires a large amount of memory. On the other hand, in some other algorithms you might just need some pseudorandom values rather than a function that you can call later with the same input and get the same output. So \( k \)-wise independent random generation should be easier than \( k \)-wise independent hashing, and this paper shows that it actually is. Specifically, they show how to generate pseudorandom values over a finite field in constant time per value with a number of bits that's linear in the independence parameter and logarithmic in everything else.

  • Dynamic integer sets with optimal rank, select, and predecessor search, Mihai Pătrașcu and Mikkel Thorup, arXiv:1408.3045 and FOCS 2014. Mihai's last paper? This uses word-RAM operations to provide a constant-time data structure for the named operations on sets whose size is polynomial in the word length. This is almost like the atomic sets in fusion trees, but better because it can be updated more quickly.

  • Computing classic closeness centrality, at scale, Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck, arXiv:1409.0035 and COSN 2014. The closeness centrality of a node in a network is, essentially, the average distance to all the other nodes. Nodes with smaller average distance are more central; it is in this sense that Paul Erdős is central to the mathematical collaboration network or Kevin Bacon is central to the acting co-star network. I had a paper at SODA 2001 on approximating centrality, but it depended on an assumption that the network has low diameter. This paper makes no such assumption but nevertheless manages to estimate the centrality of all nodes accurately in near-linear time.

  • Simple PTAS's for families of graphs excluding a minor, Sergio Cabello and David Gajser, arXiv:1410.5778. Approximation algorithms for planar graphs and their generalizations are nothing new, but they generally involve the algorithm knowing a lot about the graph and its structure (for instance using a separator decomposition). This paper shows that even very simple methods (just a local search) can give good approximations, with all the graph structure showing up in the analysis rather than in the algorithm itself.

  • Beyond the Euler characteristic: Approximating the genus of general graphs, Ken-ichi Kawarabayashi and Anastasios Sidiropoulos, arXiv:1412.1792. If a given graph can be embedded into a surface of bounded genus, then it can be embedded into a surface of bounded (but larger) genus in polynomial time (independent of the genus). The dependence of the embedded genus to the optimal genus is only polynomial, and this also leads to approximation algorithms with a sublinear approximation ratio. Previously such an approximation was only known for graphs of bounded degree.