Cloning Voronoi diagrams

I have another new preprint on arXiv, Cloning Voronoi Diagrams via Retroactive Data Structures (with Matt Dickerson and Mike Goodrich), my contribution to the recent set of papers accepted to ESA 2010. (And where are the abstracts for this list of accepted papers, anyway?)

The basic scenario for our paper is something like this: you work for Starbucks, and would like to provide your customers with a web service in which they can look up the location of the Starbucks cafe nearest them. But, you don't want to make it easy for your competitors to use the service to build up a complete database of all your cafe locations. What can you do?

We formalize this problem by considering the number of queries needed to "clone" your web service: that is, to set up a nearest neighbor query service that either exactly matches your own service or does so to a high level of approximation (depending on the precise information provided in your service's answers to nearest neighbor queries). Our main results are that a linear number of queries to your service, and a near-linear amount of processing time, are all that is needed for this sort of cloning attack. That is, if you want to guard against a cloning attack, you need to severely limit the number of different queries that a customer can make against your service.

The problem of recovering nearest neighbor data from queries turns out to be closely related to some past work on determining the shape of a convex object by poking it (or aiming a laser beam at it) and discovering the point where the object intersects the probe line: by a standard lifting transformation, a nearest neighbor query in the plane can be interpreted as a probe of a 3d convex polyhedron (the lifted image of the Voronoi diagram) by a vertical line. However, although these past probing algorithms used a small number of probes, their running times were quadratic. To speed up these methods, we guide our probes by a version of Fortune's plane-sweep algorithm for Voronoi diagrams that backs up and redoes portions of the computation when it discovers a new site that it didn't previously know about.

The retroactive data structures of the title enable us to do this backing up process efficiently. A retroactive data structure is a data structure that allows you to change the past: you can perform insertions and deletions in the sequence of data structure operations that have already been performed. Specifically, we need to answer retroactive successor queries among values that are not totally ordered but for which the subset that belongs to the data structure at any one time are ordered (the breakpoints of the beach line of Fortune's algorithm). In the non-retroactive case, this sort of comparison-based successor query problem is exactly what binary search trees are good for. We describe a relationship with point location that leads to an efficient retroactive comparison-based successor query data structure.

Comments:

dkorduban:
2010-06-11T21:46:41Z
some 2D polygon probing implementations + a big chunk of test data http://www.topcoder.com/longcontest/stats/?module=ViewOverview&rd=10929

11011110:
2010-06-11T21:52:41Z
Thanks. Is there a more detailed description of the version of the problem that they solved that is accessible without a topcoder login?

ext_68390:
2010-06-12T03:42:58Z
(Not the original poster.) The problem they solved is something like this: There is a 2D polygon inside [0,10)x[0,10). You can make M (≤300) queries where each query, given a point (x,y), tells you whether (x,y) is inside the polygon or not. You have to estimate the area of the polygon. The polygon is convex and is generated according to some random process they describe. (For more details, you can get a (free) TopCoder login, and click on PolygonArea, the problem name. It's not very close to your problem, though... yours is a very nice problem, BTW!)

11011110:
2010-06-12T03:51:10Z
Thanks for the explanation and the complement. It does sound fairly closely related, though as you say not the same. With a theoretician hat on, I would say that you can simulate a line probe (of a convex body that you know points inside and outside of) with a sequence of point probes. But with at most 300 probes available, you probably don't want to pay the logarithmic factor in the number of point probes per simulated line probe.

kushfvua: