Here's a puzzle in streaming algorithms: suppose you want to generate a stream of all the (infinitely many) integer points (\( x \),\( y \)) in the plane, in sorted order by their Euclidean distance from the origin. How little space do you need?

The answer is \( O(1) \), if you assume that a single variable can store any number whose size is comparable to the coordinates output so far and you're willing to do it slowly enough. First, observe that you can avoid non-integer arithmetic by using the squared distance \( x^2+y^2 \) in place of the distance itself. Maintain a current squared distance \( d \) and a number \( r \) equal to the square root of \( d \) rounded up to an integer (both initially zero). Search a square of radius \( r \), outputting all points at distance \( d \), then increment \( d \) and update \( r \) and repeat.

But now suppose you want to do it much more quickly, constant (amortized) time per point. In this case it's still possible to achieve sublinear space (as a function of the output length so far) with a little data structural help. The idea is to observe that if \( n \) points have been output so far, then they form a pixelated disk whose boundary length is proportional to \( \sqrt{n} \). By maintaining a priority queue of boundary points, each can be found quickly. A suitable priority queue is the bucket queue (new Wikipedia article): an array of buckets, indexed by squared distance, where bucket \( i \) stores all the boundary points at squared distance \( i \). Conveniently, the maximum squared distance seen at any time will be proportional to the number of points output, so the total time spent scanning the array for the next nonempty bucket can be amortized out to constant time per point.

Is less than \( \sqrt{n} \) space possible? I don't know, but I also don't know any streaming lower bound techniques powerful enough to prove that it isn't possible. Most of the lower bounds I know of are information theoretic, but the slow constant-space method shows that information theory alone won't prove anything nontrivial for this problem.

Along with the Wikipedia article, I added to my PADS library of Python implementations three new modules: (bucket queues), (the fast streaming integer point algorithm), and (another application of bucket queues for graph degeneracy and triangle finding).



I note that you can also do the \( O(1) \)-space algorithm in \( O(\sqrt{n}) \) time for the next point, by following a path around the edge of the disk (start at \( y=0 \), go up until you cross, go left until you cross, repeat) rather than searching the whole square in \( O(n) \) time.

A slight improvement on that is to also store the next higher value of \( d \) that you find as you make that pass, so that you're never searching for a \( d \) that doesn't exist in the set. Although it would appear that that's just a constant-factor improvement, by the same argument that your cost of retrieving the values from the bucket queue can be amortized to constant-per-point.

I notice that there are also ways to trade off space for time -- when you do the pass around the boundary, store the points for the next m distances (or points in some range, or whatever) in some suitable data structure so that you get \( O(m) \) points per pass.

And then I got stuck trying to find an appropriate data structure to show that with suitable values of \( m \) we can recover your \( O(1) \)-time algorithm....

Here's my confusion: How do you get sublinear space there? The bucket queue for up to \( n \) points will need to cover lengths up to \( O(\sqrt{n}) \), so it will need to cover squared lengths up to \( O(n) \). Thus it needs \( O(n) \) buckets, no?


Re the sublinear space part: use a hash table of buckets instead of an array of buckets (that's what my implementation does). Or, observe that all the frontier distances will be within a narrow range and use an array that only covers that range.


Oh! Okay, I see, you only need an \( O(\sqrt{n}) \) portion of the queue at any one time.

So that's then basically exactly the space-for-time tradeoff I was thinking of. You're making \( m \) proportional to \( O(\sqrt{n}) \), and so the per-point cost of doing the edge-traversal drops to \( O(1) \) since both the cost per traversal and the number of points per traversal are \( O(\sqrt{n}) \).

Given that, you could also look at a frontier of width \( 1/\sqrt{\sqrt{n}} \) on each edge traversal, and thus get something that is \( O(\sqrt{\sqrt{n}}) \) in both space and time.

In any case, given that this maps both of the known "best" solutions -- the \( O(1) \) space one and the \( O(1) \) time one -- into a single class of solutions, I wonder if that might make it easier to prove whether that's the limit on how good your solution is. Another way of looking at this class of solutions is that it appears that, given no other information but the last point found, we can find the next point in no better than \( O(\sqrt{n}) \) time -- but, in that \( O(\sqrt{n}) \) time, we can also find the next \( O(\sqrt{n}) \) points, and with clever sorting using a bucket queue, we can sort them in \( O(1) \) time.


Interesting, thank you! Can we adopt a Bresenham's algorithm to plot quarter circles of ever increasing radii? If the algorithm produces overlaps, then you can probably draw circles of radius \( r \), \( r + 1 \) and \( r+2 \) in sync and somehow deal with overlaps using a priority queue of \( O(1) \) size. This is a general idea, if a Bresenham's does produce overlapping circles, the gory details of ensuring that a point is output only once might be non-trivial. PS: does the bucket queue beat the standard binary-heap queue?


The binary heap takes log time per point, slower than the constant amortized time of the bucket queue for this problem. As for using Bresenham: the problem is that the quarter circles will include points of slightly different distances so outputting them in curve order will be different from outputting them in distance-sorted order.


Thanks, got it regarding bucket queue.

Regarding, Bresenham: this seems to be fine. Wouldn't all the interesting points be contained in the small size area around each point? (Bresenham guarantees that the plotted point is located as closely as possible to the true point. Doesn't it mean that the maximum deviation is just one?)


The distances from the origin of the points within the plotted curve differ by at most one, but that doesn't mean they are all equal. Their squared distances vary by up to the current radius.


I think with Bresenham's algorithm you're thinking of roughly the same class of algorithms I was. There are \( O(\sqrt{n}) \) squares in the rasterized quarter-circle, so at best it takes you \( O(\sqrt{n}) \) time to draw it, and it covers \( O(\sqrt{n}) \) points. Given that, you've now got choices of what to do. Either you can take all of those points and put them in a \( O(\sqrt{n}) \)-sized container -- and with some good choices of container and little cleverness involving keeping only the constant fraction of them [1] rather than actually all of them, you can then dump them out of that container in the right order in \( O(\sqrt{n}) \) time, or amortized \( O(1) \) time per point. Or, alternately, since you know the next point is within that band somewhere, you can keep the best point you find that's farther out than the last one you reported which only takes \( O(1) \) storage, but since you have to traverse the whole quarter-circle over again for every new point, you're taking \( O(\sqrt{n}) \) time per point.

[1] Basically, the trick there is that you ensure that your rasterized band entirely covers all points whose distance from the origin is between \( i \) and \( i+1 \). And then you only store the points in \( [i, i+1) \) when you go around the quarter-circle. The fact that your band covers that range means you know that you can report the points you found without missing any others that should go between them, and the fact that the ranges are non-overlapping means you don't have duplicates.


Yes, you are right. However, I somehow missed the fact that we need to carry out \( O(\sqrt{n}) \) traversals when the radius increases from \( r \) to \( r + 1 \). Doing \( O(\sqrt{n}) \) operations for each value of the squared radius is still better than search the square in \( O(n) \) time.


On a different note, your description of the bucket queue in the new article was reminding me of radix sorts, and while I was chasing that down I came across the Wikipedia articles for "bucket sort" and "pigeonhole sort", which both look like they might be worth linking to from the "bucket queue" article, and both could be significantly improved by linking back to it (and could likely use some other love as well).


Thanks for the suggestion. While researching the new article I also ran across references to a priority queue structure called a radix heap, but that seems to be something different (related to radix sort).

(More comments from G+)