Random no-three-in-line sets

The UCI algorithms, combinatorics and optimization seminar this week featured a nice talk by local mathematician Nathan Kaplan on the no-three-in-line problem, which asks how many points you can choose from an \(n\times n\) grid so that no three of them lie on a single line. For small \(n\), such as the \(10\times 10\) grid below, the answer is \(2n\) (any more than that would lead to three points on a horizontal line) but it has long been conjectured that large enough grids have fewer points in their optimal solutions.

20 points in a 10x10 grid with no three in line

There was no recent progress to report, but instead Nathan’s talk had two main points: First, to express his skepticism about the heuristic reasoning that led to the conjecture that the right bound should be

\[\frac{\pi}{\sqrt{3}} n \bigl( 1+o(1) \bigr) \approx 1.814 n.\]

To get to this conjecture, Richard K. Guy and P. A. Kelly (not the same Kelly as the one associated with the Sylvester–Gallai theorem on lines through only two points) used a probabilistic approach, based on the idea that certain random events would be largely independent of each other when they’re really not. Nathan thinks that if we look more closely at how non-independent they actually are we might not reach the same conclusion.

Second, Nathan pointed out a phenomenon occurring broadly across discrete geometry, where point sets that are extremal in some way often have some kind of algebraic structure. A particularly clear instance of this occurs in the 2013 solution by Green and Tao of the Dirac–Motzkin conjecture on the minimum number of ordinary lines occurring in the Sylvester–Gallai theorem, and on the closely related orchard-planting problem on the maximum number of three-point lines possible for \(n\) points in the plane. There, the best known solutions involved placing points on a cubic curve, and Green and Tao solved the problems by proving that any good enough solution would have to do the same thing.

Other instances of the same algebraic structure phenomenon involve the Erdős distinct distances problem, where point sets with few distances tend to look either like grids or evenly spaced points on lines or circles, the Erdős–Anning theorem according to which infinite sets of points with integer distances must lie on a line, and a theorem of de Zeeuw and Solymosi that infinite sets of points with rational distances must either be Zariski-dense or lie almost entirely on a line or circle. For a variant of the no-three-in-line problem, in projective planes over finite fields of odd order, the optimal solution can be proven to be the set of points on a quadratic curve. And for the no-three-in-line problem itself, the same algebraic structure shows up in the known ways of placing many points with no three on a line: \(n-o(n)\) points by Erdős, using a quadratic curve modulo a prime near \(n\), and later \(3n/2 - o(n)\) by Hall et al. using multiple copies of a quadratic curve modulo a prime near \(n/2\). However, for this problem we don’t know how to prove that this structure is necessary.

In graph theory, we often have a different phenomenon, that it can be easier to find extremal examples with no structure via the probabilistic method than to find structured examples. A standard example of this is in the Ramsey graphs, graphs with neither a large clique nor a large independent set. The best way we know how to construct these graphs is just to choose a graph independently at random from the set of all graphs; this will with high probability limit the cliques and independent sets to logarithmic size. There is an algebraic construction that we think does as well (the Paley graphs) but the algebraic structure isn’t necessary and we don’t know how to prove that it works.

So this naturally raises the question: how good are the point sets we can find by applying the probabilistic method to the no-three-in-line problem, rigorously rather than heuristically? The answer is: not bad, but not as good as the algebraic constructions. This must have been long known (especially given that Erdős was a major contributor to what we know about the problem) but I don’t think it has been published, probably because it’s not an improvement over other methods. Still, I think it makes a useful exercise to work it out.

A key tool in the analysis is a bound proven by Guy and Kelly on the number of collinear triples in an \(n\times n\) grid. This number is:

\[t_n=\frac{3}{\pi^2} n^4\log n + O(n^4).\]

If we choose \(k\) points at random from the grid, the probability that we hit all three points of some particular collinear triple is \(p_k = (k/n^2)^3\). And as long as this probability is small enough that \(p_k t_n<1\), we will have nonzero probability of randomly choosing points with no three-in-line. This is an instance of the union bound: when you have a collection of “bad” events, each with probability so small that the sum of probabilities is less than one, the probability of avoiding them all is nonzero. Here the bad events are the events that we choose all three points from one of the collinear triples. The same bound can also be seen by observing that the expected number of collinear triples in our sample is \(p_k t_n\) (the contribution to the expected number from any particular collinear triple is just \(p_k\), and the total expected number follows by linearity of expectation). But an expected value of a non-negative integer can only be less than one when there is nonzero probability of the value being zero. So as long as

\[k < \frac{(\pi n)^{2/3}}{(3\log n)^{1/3}},\]

we will have nonzero probability of our sample having no three in line. But this is far below the linear numbers of points from the algebraic constructions.

If the bad events were independent from each other, we wouldn’t have to use the union bound. As long as each of them has probability less than one, their disjunction would also. But they’re not independent. The exact nature of their dependence varies according to the sampling scheme we use, but most obviously, when one bad event turns out to be true, it significantly increases the probability that another bad event that uses one of the same three points is also true. Less obviously, when we sample \(k\) out of \(n^2\) points, and one bad event is true, it slightly decreases the probability that another event disjoint from it is true, from \((k/n^2)^3\) to \(\bigl((k-3)/n^2\bigr){}^3\), because there are fewer remaining points in the sample that could hit the disjoint event. If we only look at the first type of dependence, each bad event only depends on \(O(n^2\log n)\) others, the ones that it shares a point with, significantly fewer than the total number of events.

In a situation like this where we want to avoid a collection of bad events that have limited dependence on each other, the right replacement for the union bound is a tool called the Lovász local lemma. Intuitively, it says that we can do the same sort of reasoning as the union bound (just checking that the sum of bad event probabilities is small), but replacing the total number \(t\) of bad events with the number \(d\) of other events that any one bad event can depend on. If each event has probability \(p\), and if \(pd\le 1/e\approx 0.368\), then there is still nonzero probability of avoiding all the bad events. Here, \(p_k\) is still \((k/n^2)^3\), and we have \(d_n=O(n^2\log n)\) instead of \(t_n=O(n^4\log n)\). So this looks like it should let us use much bigger samples, and get much bigger sets of points with no three in line. But there’s a problem: we can’t safely ignore the weaker dependences between disjoint events.

We can eliminate this second kind of dependence by choosing our sample differently: choose randomly for each point of the grid whether or not to include it in the sample, independently of the other points, with probability \(k/n^2\), so that the total expected sample size comes out right. For this kind of Poisson sample, disjoint collinear triples have independent bad events. But there’s another kind of bad event that we also have to worry about: the event that our sample size is much smaller than its expected value. For the union bound that’s ok (this event has tiny probability) but for the Lovász local lemma it’s a problem because it’s dependent with all the other bad events.

To navigate between the Scylla and Charibdys of too many weak dependences or the possibility of overly small samples, researchers on related problems (notably on finding large general-position subsets when a point set has no long lines of points; see sections 9.4 and 9.6 of my book) have turned to a different sampling method for a different problem. Instead of looking for a single large point set with no collinear triples, they look for a partition of the points into \(g\) different subsets, all of which have no collinear triples. And instead of choosing a single sample of a fixed size, or with a fixed inclusion probability, they choose independently for each of the points which of the \(g\) subsets it belongs to.

Now we can use the Lovász local lemma! The bad events are the same (some triple of collinear points from the grid all get assigned to the same subset) but now they have a different probability, \(1/g^2\). To calculate this, observe that the first of the three points in a collinear triple can be assigned to any of the \(g\) subsets. Once its subset is determined, each of the other two points of the collinear triple has an independent \(1/g\) probability of landing in the same subset. These bad events are only dependent when they come from triples that share at least one point, so each bad event depends on \(O(n^2\log n)\) others. Applying the Lovász local lemma, a solution exists when \((n^2\log n)/g^2\) is small enough, i.e. when \(g=O\bigl(n\sqrt{\log n}\bigr)\). And for this solution (a partition of the grid into \(g\) subsets, each of which has no three in line) the largest of the \(g\) subsets has size \(\Omega\bigl(n/\sqrt{\log n}\bigr)\).

So the probabilistic method can generate near-linear solutions, while the algebraic structure of the problem leads to solutions that are actually linear. By choosing randomly, we lose only a sublogarithmic factor, but this factor means that (without further refinements) this method doesn’t tell us anything useful about the still-open research question of what the right constant factor on the linear solutions is. On the other hand, the fact that the probabilistic method gets so close in this case suggests to me that it will be difficult to apply the method of showing that all good solutions must have an algebraic structure, which has worked so well for some other problems in this area.

(G+, \(\mathbb{M}\))