Suppose you have an array of \( n \) real numbers (or other totally ordered values that you can compare but not treat as small integers). You want to answer queries that specify a contiguous subarray and return the sorted sequence of values within that subarray. Without any preprocessing you could query a subarray of length \( k \) in time \( O(k\log k) \), by pulling out the subarray and then comparison sorting it. With preprocessing, how much better can we do?

The obvious answer would be to sort the input, and replace each real number by an integer, its position in the sorted sequence. Doing this would allow us to use integer sorting algorithms to handle each query. This works particularly well for long queries (longer than \( n^{\epsilon} \) for some \( \epsilon\gt 0 \)), because then you can use radix sort to answer the query in linear time. But it is still somewhat problematic for shorter queries, where we don't have time to set up and then take down the big arrays used by radix sort, and where other integer sorting algorithms are both less practical than radix sort and slower than linear time.

So here's a trick that works better. For each \( i \) from \( 1 \) to \( \log^* n \), divide the array into sub-arrays of size \( 2^{2^i} \). Sort these subarrays (by merging the smaller subarrays from the next smaller value of \( i \)) and, for each element, write down its position in the sorted subarray; this is a \( 2^i \)-bit binary number. Then, for each element of the whole big array, write down one number, the concatenation of these numbers for all the subarrays. (It works best if the bigger subarrays get more-significant bits of the concatenation.) That way, each of the original inputs gets an index number that is still only \( O(\log n) \) bits long. By doing the recursive array partition top-down instead of bottom-up you can make the number of bits at most \( 2\log n \), and by using a larger base than 2 you can make it at most \( (1+\epsilon)\log n \) for any \( \epsilon\gt 0 \).

Then, to handle any query subarray of length \( k \), decompose the query into at most two of the subarrays of the recursive decomposition above that are both of length \( O(k^2) \). Mask off the higher order bits of the index numbers within those subarrays so that the remaining number of bits is \( O(\log k) \), small enough that radix sort can be used to sort these numbers in linear time. Merge the two sorted subarrays to give the answer to the query.

The time to preprocess the input is \( O(n\log n) \), unsurprising since we have to comparison sort somewhere. But the space for the resulting data structure is only \( O(n) \), and it allows sorted-subarray queries in time linear in the length of the subarray.



Clever. From the practical perspective, however, you can use radix sort on floats. It will probably slower, though, compared to the radix sort on ordinal numbers. And in some sense we would treat floats as small numbers, b/c the cost of the radix sort depends on the size of the float (so there is a logarithmic constant anyways). But it might be faster in practice than quick sort.


In the actual application I had in mind for this (though it won't end up going in because it complicates things without actually making a difference to the final result) the numbers to be sorted are not floats, but rationals.

In some architectures floats have the same sort order as integers of the same size, so you can just use integer sorting algorithms on them directly. But even when they don't, as you say it's not hard to adapt the algorithms.


ohh, rationals is a different thing.