Earlier this week in my graduate data structures course we covered splay trees. I found a slightly different way of covering the static optimality property (splay trees are to within a constant factor as good as any fixed tree for any sequence of data accesses) that I think makes it easier to understand than the usual entropy-based approach.

To begin with, I followed the analysis of Sleator and Tarjan's original STOC '83 paper (repeated in Tarjan's Data Structures and Network Algorithms). According to this analysis, if each data item is given a weight \( w_i \), with \( W \) denoting the sum of the weights, then the amortized cost to access item \( i \) is \( O(\log W/w_i) \). One of the beauties of splay trees is that they automatically adjust themselves to achieve this performance even when these weights are not known by the algorithm.

Proving this amortized time bound is not difficult. As in the STOC paper, define the rank of an item to be the floor of the log (base two) of the sum of the weights in its subtree (the 1985 JACM version gets a tighter analysis by omitting the floor). If one uses as a potential function the sum of the ranks, then one can analyze the sequence of splay steps by which item \( i \) is accessed, by partitioning the steps into the ones in which the rank of \( i \) increases and the ones in which it stays the same. The number of steps in which the rank increases can be at most the difference between the smallest and largest ranks available to \( i \), which is \( \log W/w_i \), and the potential function increases in these steps by a telescoping sum that ends up proportional to the same amount. In the remaining steps, the potential function decreases, canceling the amortized time for the steps.

The point, though, is not so much how one gets to this \( O(\log W/w_i) \) bound but how one sets the weights \( w_i \). If they are all equal to one, then \( W=n \) and the amortized cost per operation is \( O(\log n) \). If the access sequence is chosen randomly at each step from some distribution with probability \( w_i \) of accessing item \( i \), then \( W=1 \) and the average amortized cost per operation ends up as \( O(\sum w_i \log 1/w_i) \), the Shannon entropy.

At this point it's typical to wave one's hands, assert that optimal binary search trees have an access cost proportional to the Shannon entropy, and conclude that splay trees are within a constant factor of optimal. Sleator and Tarjan, for instance, say that this follows by "a standard theorem of information theory" and cite Abramson's information theory book, without even a hint at a page number where such a theorem may be found.

Instead, it turns out to be easier to prove directly that splay trees are as good as any static tree \( T \) (and therefore, as good as the optimal binary search tree, whatever tree that might be). To do so, let \( \ell_i \) denote the number of nodes on the path to item \( i \) in tree \( T \), and let \( w_i = 3^{-\ell_i} \). In any finite binary tree, \( W\lt 1 \) (in a complete infinite binary tree, the sum is exactly one, as can be seen by summing each level separately and then summing the levels as a geometric series); this is related to Kraft's inequality. Therefore, using these weights for the splay tree analysis, the time per access is \( O(\log 1/3^{-\ell_i}) = O(\ell_i) \), within a constant factor of the time in tree \( T \) as was to be shown.

Sadly, this approach doesn't seem to work to prove dynamic optimality of splay trees. Conceivably, one can define weights based on exponential functions of the access cost in some unknown dynamic optimal search tree, and use these weights to analyze a splay tree on the same access sequence, but it seems that a change to the optimal tree can cause a large increase in the potential of the splay tree, making it have a large amortized cost for that operation.