Suppose you have a polynomial-time algorithm that operates on sets of weighted elements, and involves comparisons of the weights of different sets. (This describes many different algorithms for shortest paths, minimum spanning trees, minimum weight matchings, closures, etc.) But suppose also that your algorithm is only guaranteed to work correctly when different sets always have distinct total weights. When comparisons could come out equal, your algorithm could crash or produce incorrect results. But equal weights are likely to happen when the element weights are small integers, for instance. Is there some semi-automatic way of patching your algorithm to work in this case, without knowing any details about how it works?
An obvious thing to try is to add small distinct powers of two to the element weights. If these numbers are small enough they won't affect initially-unequal comparisons. And if they're distinct powers of two then their sums are also distinct, so each two sets get a different perturbation. But this method involves computing with numbers that have an additional n bits of precision (where n is the number of elements in the problem), and a realistic analysis of this method would give it a near-linear slowdown compared to the unperturbed algorithm. Can we do better?
Exactly this issue comes up in my latest preprint, "Rooted Cycle Bases" (with McCarthy and Parrish, arXiv:1504.04931, to appear at WADS). The paper is motivated by some problems concerning kinematic chains, and studies problems of finding a cycle basis of a given graph in which all basis cycles are constrained to contain a specific edge. When all cycles have distinct weights a simple greedy algorithm can be used to find a minimum-weight basis, but if there are ties then this algorithm can easily go astray. Its analysis is complicated enough that, rather than trying to add special case tie-breaking rules to the algorithm and proving that they still work correctly, I'd like a general-purpose method for converting algorithms that work for distinct path and cycle weights into algorithms that don't require distinctness.
If randomization is allowed, it's not difficult to perturb the weights efficiently, so that additions and comparisons of weights still take constant time. Just let ε be a sufficiently small number (or by symbolic computation treat it as an infinitesimal) and perturb each element weight by a randomly chosen integer multiple of ε where the random integers of this scheme have polynomial magnitude. These integers are small enough that (on a machine capable of addressing its own memory) they fit into a machine word, so adding them and comparing their sums takes constant time per operation. And by choosing the polynomial to be large enough, we can ensure that with high probability each two sets that we compare will have different perturbations. (We don't care about the many other pairs of sets that we don't compare.)
The deterministic case is trickier. To solve it (in an appendix of the preprint) I define a data structure that can build up a persistent collection of sets, by adding one element at a time to a previously-constructed set, and then can answer queries that seek the smallest index of an element that belongs to one set and not another. Essentially, it involves a binary tree structure imposed on the elements, and a recursive representation of each set that follows the tree structure but shares substructures with other sets, so that differing elements can be found by tracing down through the tree looking for non-shared substructures. The figure above (from the paper) illustrates in a schematic way what it looks like; see the appendix for details. This allows the power-of-two technique to work, by replacing numerical comparisons on high-precision numbers by these set queries. It would also be possible to add element-removal operations, although I didn't need these for the cycle basis problem. But it's a bit cumbersome and slow: comparing two sets with this method takes logarithmic time, and adding an element to a set is slightly slower than that. And the details involve deterministic integer dictionary data structures that are theoretically efficient but for practical problem sizes probably worse than binary search trees. So I think there's definitely scope for coming up with a cleaner and faster solution.