# Many points in small boxes

My latest arXiv preprint is "Covering many points with a small-area box", arXiv:1612.02149, with de Berg, Cabello, Cheong, and Knauer. It is about finding an axis-parallel rectangle that covers \( k \) out of \( n \) points in the plane and has as small area as possible. Here, both \( k \) and \( n \) are variables, but \( k \) is expected to be significantly smaller than \( n \). So, in finding a fast algorithm, the goal is first to minimize the exponent of \( n \) in the time bound, and only after that to minimize the dependence on \( k \). We achieve a bound of \( O(nk^2\log n) \). There are also some related algorithms for approximating the dual problem where you are given a fixed area and want to maximize the number of points that are covered.

The result has a bit of a curious history. A group of us worked on it, and came up with the best algorithms we could, thinking it was a new and unsolved problem. Then, we did a more careful literature search and found that it was actually a known problem, and worse than that, that the time bounds we had achieved were all known or improved, so that we had no result. But then we looked again and discovered that the supposedly known bounds aren't actually true.

What happened was that there was a line of research (that I was part of) in the early 1990s on finding "small" subsets of a given number of points. A sequence of papers on this subtopic identified many different definitions of what it meant to be small (like having low circumradius, diameter, etc) that could all be solved by similar algorithms. These algorithms worked by covering the input by clusters of \( O(k) \) points, one of which could be guaranteed to include the optimal solution, and then worked separately within each cluster. A simple version of this is to choose \( O(k) \) nearest neighbors to each point and to use these \( n \) sets of near neighbors as the clusters; later refinements used fewer and easier-to-find clusters. The algorithms of this type varied by what they did inside the clusters, but this general method ensured that the dependence on \( n \) would be linear. And one of the problems solved using this approach was almost the same as the one in our new paper, but with axis-parallel rectangle perimeter in place of rectangle area.

Later, another paper studied the problem of finding smallest rectangles covering \( k \) points, but with \( k \) assumed to be very close to \( n \) rather than much smaller than \( n \), so that factors of \( k \) in the runtime are bad and factors of \( n-k \) are good. It solved both the minimum-perimeter and minimum-area problems for the large-\( k \) case. Of course, it quoted the past work on the minimum perimeter problem for small \( k \), but it didn't make clear that this past work solved only the case of perimeter minimization and not area minimization. And so, other papers, quoting that one, started also saying that the minimum-area \( k \)-point rectangle problem (for the small-\( k \) case) had also already been solved.

It hadn't, but now it has.