Mike Goodrich and I have both been teaching linear probing, he in his undergraduate algorithms class and I in my graduate data structures class. Comparing notes, we think we have a way of analyzing it that is considerably simpler than ways I've seen it analyzed before (e.g. Mihai Pătrașcu's nice invited talk at WADS 2011) and that avoids using Chernoff bounds (a very useful tool, and the way I did it in my lecture, but too advanced for this early in Mike's class).

We assume we have \( n \) items placed into a hash table of size \( N = 2n \), for a load factor of \( 1/2 \). The time to perform an operation, say a query or an insertion, can be charged to the run of consecutive nonempty table positions that the operation lands in, and with this charging scheme the total time charged to a run is proportional to the square its length. So, if \( p_k \) denotes the probability that a particular contiguous sequence of \( k \) table positions is a maximal run of nonempty positions, the total time is

\[ \frac{1}{2n}\sum_{i=1}^{2n}\sum_{k=1}^{n} p_k O(k^2). \]

The initial \( 1/2n \) (averaging over the possible places that the operation could begin its probe sequence) and the first sum from \( 1 \) to \( 2n \) (summing over the possible places that a run could start) cancel each other, leaving only the second sum. As long as we can show that \( p_k \) is exponentially small in \( k \), this sum will converge to \( O(1) \), even if we replace the upper bound \( n \) of the second summation by infinity.

The probability \( p_k \) is at most the probability that exactly \( k \) items land within the given interval, which is exactly

\[ \binom{n}{k}\left(\frac{k}{2n}\right)^k \left(1-\frac{k}{2n}\right)^{n-k}. \]

Using Stirling's approximation to replace the binomial coefficient, and ignoring the square root terms from Stirling's approximation (which anyway always lead to a factor less than one), this becomes

\[ \frac{n^n}{(n-k)^{n-k}k^k}\left(\frac{k}{2n}\right)^k\left(1-\frac{k}{2n}\right)^{n-k}=2^{-k}\left(1+\frac{k/2}{n-k}\right)^{n-k}\le\left(\frac{\sqrt e}{2}\right)^k. \]

And we're done! We have a probability that is exponentially shrinking as a function of \( k \), just as we need. The magic simplification in the middle of this last equation is just splitting the \( n^n \) term into two terms with exponents \( n-k \) and \( k \), and then grouping terms with the same exponent together with each other.


snoeyink: nitpick
If you are going to assume that \( k \) is maximal, then do you need to include gaps on each end in bounding "The probability \( p_k \) is at most...?" If you don't on the top end, it seems you'd have to worry about other elements probing in.
11011110: Re: nitpick
I think that can be swept under the "at most" in the sentence "The probability \( p_k \) is at most..."
Nice. That's really one from the book.
ext_924205: You don't need the Stirling approximation

Just use \[ \binom{n}{k}\le \frac{n^n}{k^k\cdot(n-k)^{n-k}} \] To see this, look at \[ \sum_{i=0}^n \binom{n}{i} k^i (n-k)^(n-i) = (k + (n-k))^n = n^n \] So the term with \( i = k \) must be\( {}\le n^n \)

I'm planning to show this analysis of linear probing my sophomore data structures course, and I didn't want to introduce the Stirling formula...

11011110: Re: You don't need the Stirling approximation
Thanks! So you did find a good argument for this bound. I saw your tweet about it yesterday...
md_tui: all constant load factors < 1 work

Neat argument.

It has an easy generalizion to all fixed load factors \( \alpha \lt 1 \), in place of 0.5.


If the table size is \( m \) and the set size is \( n \), and \( n=\alpha\cdot m \), one can do the same calculation and gets \( p_k \lt (\alpha\cdot\exp(1-\alpha))^k \).

Now \( \alpha\cdot\exp(1-\alpha)\lt 1 \) for \( \alpha\lt 1 \) (just plug in \( \alpha=1 \) to get \( 1 \), and differentiate to see the function is increasing).

Conclusion: Linear probing has constant expected time for each load factor \( \alpha \lt 1 \).

11011110: Re: all constant load factors < 1 work
Sure, but I thought it would help keep the argument simpler for pedagogical reasons to use a fixed \( \alpha \). Also, for \( \alpha \) close to one, linear probing degenerates more badly than double probing, but to me that's less of an argument to use a different algorithm and more of an argument for keeping \( \alpha \) well away from one.