Bichromatic Euclidean minimum spanning trees

One of the talks at CCCG was by Ahmad Biniaz, on red-blue-purple spanning graphs: given a partition of a point set into red, blue, and purple points, find a minimum-weight graph such that the red-purple and blue-purple subsets induce connected subgraphs. It can be solved by matroid intersection, but this is slow and Biniaz talked about a faster special case. Anyway, in his talk, Ahmad mentioned a different colored spanning problem: find a minimum spanning tree of the complete bipartite geometric graph determined by a set of red and blue points in the Euclidean plane (with no purple this time). He noted that bichromatic closest pairs can be used to solve it in time \( O(n\log^8 n) \), and asked whether a faster algorithm is possible. The illustration below gives an example of this problem and its solution.

A set of red and blue points and their bichromatic minimum spanning tree

Although Ahmad didn't go into detail, the \( O(n\log^8 n) \) bound that he mentioned is not hard to achieve by a form of the Prim–Dijkstra–Jarník minimum spanning tree algorithm. This algorithm maintains a tree on a subset of the input graph vertices (red and blue points in our case) and at each step adds to it the shortest edge to a new vertex. We split the problem of finding the shortest edge into two subproblems, one of finding the shortest distance between a red point in the tree and a blue point outside the tree, and a second one with the colors reversed. For both problems, we need to maintain a dynamic red-blue point set and find the closest red-blue pair, which is exactly the bichromatic closest pair problem. My paper "Dynamic Euclidean minimum spanning trees and extrema of binary functions" (DCG 1995) shows how to solve dynamic bichromatic closest pairs using another data structure for dynamic (uncolored) nearest neighbor queries, in \( O(n\log^2 n) \) nearest-neighbor operations per closest-pair update, and Chan's paper "A dynamic data structure for 3-d convex hulls and 2-d nearest neighbor queries" (SODA 2006) gives a randomized method for dynamic nearest neighbor queries with \( O(n\log^6 n) \) time per update. Putting these two data structures together and using them to find each successive edge to add to the tree in the Prim–Dijkstra–Jarník minimum spanning tree algorithm gives the claimed \( O(n\log^8 n) \) (randomized) time bound.

But of course, Prim–Dijkstra–Jarník is not the only way to compute minimum spanning trees, and for this problem it turns out to work better to use Borůvka's algorithm. This algorithm maintains a forest (initially with each vertex in its own one-node tree) and at each stage of the algorithm adds a set of edges, the shortest edges connecting each tree to a vertex outside of it. The number of trees goes down by a factor of two or more in each stage, so there are \( O(\log n) \) stages. To solve this efficiently for red-blue point sets, the problem we need to solve is the following: given a partition of the points into subsets (the current trees), find for each point the nearest oppositely-colored point that's not in its own tree. Let's call this the "nearest unrelated point" problem, as a convenient shorthand.

To solve the nearest unrelated point problem, number the subsets of the partition, and represent each of these numbers in binary as a sequence of \( O(\log n) \) bits. Define a "canonical set" \( C(p,b,c,) \) to be the subset of points that belong to a set whose label's \( p \)th bit is \( b \), and whose color is \( c \). Then for a point \( q \), the subset of points that are unrelated to \( q \) can be expressed as a union of logarithmically many canonical sets \( C(p,b,c,) \), one for each possible value of \( p \), where \( b \) is the complement of the \( p \)th bit of the label for \( q \)'s subset, and \( c \) is the opposite color to \( q \). Therefore, we can solve the nearest unrelated point problem by computing a Voronoi diagram for each canonical set, and building a point location data structure for each Voronoi diagram. For each point \( q \) we query the canonical sets whose union is the set of unrelated points to \( q \), and combine the result of the queries to find \( q \)'s nearest unrelated point. There are \( O(\log n) \) Voronoi diagrams, each of which takes logarithmic time to build, after which we take \( O(\log^2 n) \) time per point to query these diagrams. So the total time for all nearest unrelated points is \( O(n\log^2 n) \).

Since Borůvka's algorithm takes logarithmically many stages, and each stage can be performed using a single computation of all nearest unrelated points, the total time to construct the bichromatic minimum spanning tree is \( O(n\log^3 n) \). I'm not entirely satisfied with this bound, though. Although much improved from the bichromatic closest pair solution, it still has too many logs. And what I'd really prefer would be a construction like the usual Euclidean minimum spanning tree one, where we find a single sparse graph guaranteed to contain the optimal spanning tree, and then just run a graph algorithm on that graph. So I think there's still more to do on this problem.

(G+)