So this past week I traveled to WADS, a symposium on algorithms and data structures held every two years in Canada. (The W stands for "workshop", as it was originally intended to be a small meeting at which people made informal presentations and collaborated on research, but it was always really a symposium and they changed the name but not the acronym a few years ago.) This year's Canadian location was Brooklyn, and John Iacono did a great job of organizing the conference with a welcoming reception at a beautiful nearby art gallery, a fun banquet excursion cruising in a boat on the Hudson and East Rivers, good food at the events and coffee breaks, good sightlines and acoustics in the conference rooms, and low registration fees — all things we tend to miss out on with other institutionally-organized conferences such as most offerings of SODA.
Of the three invited speakers (János Pach, Bob Tarjan, and Mihai Pătraşcu) the talk I would have expected to like best based on my own research interests was the one by János, on the problem of placing n points in a unit square in order to minimize the area of the largest empty axis-aligned rectangle, and some topological generalizations of the problem. A good set of points can be viewed as an epsilon-net, and the question of where the optimal rectangle area lies within an interval bounded between 1/n log n and 1/n is closely related to some recent research on the size of epsilon-nets that I mentioned in my report from SoCG 2011.
The invited talk that I actually ended up enjoying best, though, was the one by Mihai, on hashing data structures. (Tarjan's talk was also about data structures, for binary searching.) Mihai convinced me that I've been teaching hashing wrong — I tend to concentrate on hash chaining and double hashing before moving on to more complicated methods such as cuckoo hashing, but according to Mihai, linear probing is actually much better. This is the collision resolution strategy where, when you try to hash an item x to table cell h and you discover that it's already occupied, you simply move on to h + 1, h + 2, etc., until finding an unoccupied cell. If a 1 − ε fraction of the table is full, then the expected number of probes you have to make is O(1/ε2)) (originally proved by Knuth using sums of binomial coefficients, later proved in a more conceptual and teachable way by Pagh2 et al), which seems like it should be worse than the O(1/ε) of double hashing. But actually, you don't need to make ε be small, so the asymptotics are unimportant, and linear probing wins big because its consecutive memory access pattern works well with modern memory hierarchies and prefetching strategies. It works so well, in fact, that (in an implementation at AT&T) it takes only 5% more time than a single memory access per lookup. Mihai also spent some time speaking about how to implement hash functions for linear probing. The usual analysis of linear probing needs log-wise independence, which can be achieved in theory in constant time per hash but not in practice with a constant that makes any sense to use. Random linear functions modulo a large prime are frequently used, but Mihai observed that if a sequence of n consecutive integers are inserted with this hash function then the probability of k time per operation is proportional to 1/k — most of the time you will see only constant running time but you might get unlucky and be really slow — leading to an expected time of Θ(log n) per operation. Instead, simple tabulation hashing is about the same speed and actually works reliably. In the questions after the talk, Mihai mentioned that this analysis had actually convinced AT&T to change the hash functions in their internet routers, a nice example of theory making a difference in practice.
Among the contributed talks, more than half were on computational geometry (apparently this is typical for WADS) so one could go to the whole conference and not see a non-geometric talk, which is pretty much what I ended up doing.
Muriel Dulieu convinced me that covering point sets with V-shapes that are as thin as possible is actually useful, as a way of identifying the corners when you're trying to fit a curve to a noisy point set. Diane Souvaine asked for a robust statistical version of the same fitting problem, in which one is allowed to throw away outliers before finding a V that covers the remaining points; it seems like an interesting problem on which not much is likely to be known.
In the session on computations with imprecise points, Jeff Phillips nicely categorized these types of problem: there are problems in which the points are confined to uncertainty regions: you know that there is a point within each region, but not where in the region it is, and the task is to do some precomputation based on the regions that will later speed up the real computation you want to do once the point locations become known. In this type of problem, the focus is on the worst case time for the algorithm, or on the worst case of how badly the points could be arranged within the regions — for instance Phillips own paper showed that it is possible to determine efficiently the maximum possible circumradius for points with uncertainty regions, or more generally the maximum value of an LP-type minimization problem, but that non-LP-type problems such as diameter are much harder. A second type of problem concerns indecisive points — points that each have a probability distribution, independent from but not identical to the distributions of the other points, where the focus is instead on computing the expected behavior of the points or the whole distribution of some geometric functional of the points. This is in contrast to the usual assumption in statistics that a set of samples are all drawn from a common distribution. And in a third type of problem, which he called stochastic, one is given as input a set of precise point locations, but only some subset of those points will actually show up in the real input. Conveniently, each of these three types of problems was represented by a talk in the session.
Therese Biedl spoke on the "segment minimization" problem, which actually involves representing integer matrices as a sum of matrices (with as few summands as possible) in which each matrix in the sum has only two numbers, 0 and some positive integer k, as its coefficients, and in which the nonzero entries in each row of each summand are consecutive — this has applications in planning certain forms of radiation therapy for cancer treatment, as the shape of the summands represents the physical shape of an aperture in the radiation beam and the number k represents the time or intensity of the beam. She spoke about fixed-parameter tractability of a version of the problem in which the matrices have one row and many columns (parametrized by the number of summands), and improved solutions for matrices with more than one row. But that made me wonder: what can be said about versions of the problem in which the matrices have one column and many rows? In this case the input is a set S of integers (the order of the rows doesn't matter) and what one wants is some sort of small basis multiset B such that every number in S is a sum of a sub-multiset of B. Obviously, setting B to be powers of two will work, but is not always optimal. How hard is it to find the optimal solution?
The final talk that I saw (in a session I ended up chairing) was by Sarah Eisenstat, a student of Erik Demaine at MIT. She spoke about linkages where rotation around the axis of any edge is permitted but the two edges at any vertex must keep a fixed angle to each other, as is the case for instance for molecules. It turns out that, even for linkages in the form of a single path, with nice bond angles and nice edge lengths, it's NP-hard to tell whether there's a way of folding the linkage into a single plane so that it doesn't self-intersect. The proof is a nice reduction from a form of planar 3-SAT in which the variables are laid out on a line, each clause must have three literals with the same sign, and the positive and negative clauses are on opposite sides of the variable line.
I still have another post coming soon, with photos from near the conference location, but that's it from the conference itself.
For the segment minimization problem, the problem is NP-hard even if there is only one column (Collins et al., IPL 2006.) This case is quite trivially FPT in H (the largest entry), since there can be only H+1 different kinds of rows, so you can delete duplicates and then solve a 1xH matrix by brute force.
Thanks! I was intending to ask you about this after your talk but then I forgot.