Tallying preference ballots efficiently

The Schulze method for determining the results of multiway votes has three parts:

Use the ballots to determine the results (winner and margin of victory) of each possible head-to-head contest.
Perform an all-pairs widest path computation on a directed complete graph weighted by the margins of victory.
Find the candidate with wider outgoing than incoming paths to all other candidates.

The second part can be done in cubic time using the Floyd-Warshall algorithm (the choice in practice) or faster using fast matrix multiplication. And the third part is easy. But what about the first part? Here, some Wikipedia editor wrote in 2011 that the first part, "if implemented in the most straightforward way, takes time proportional to \( C^2 \) times the number of voters" (where \( C \) is the number of candidates). But then last year some other editor tagged this claim as being original research.

This raised some questions for me. Given how straightforward it is, can this really be considered to be original research? Is it possible to find a published source for the time analysis of this step that can be used to untag it? (If you know of one, please tell me or add it to the article.) Is the algorithm with this time bound really "the most straightforward way"? And if this is the time bound you get by doing things straightforwardly, can we get a better time bound by trying to be more clever?

To begin with, I think the most straightforward way of solving this is the following. I'll assume that each ballot is stored as a sorted array of the candidates, most-preferred first. For each pair of candidates, loop over all ballots, and search the ballot array sequentially to find the first position that has one of the two candidates in the pair; tally that ballot as a win for the candidate that was found. When you've looped through all the ballots, compare the tallies for the two candidates to determine the winner, and subtract the tallies to determine the margin of victory. But this takes time \( O(C^3), \) not \( O(C^2). \)

The \( O(C^2) \)-time method that was intended is presumably something like the following one. We will make a matrix \( M[i,j] \) that will eventually store the number of voters who prefer candidate \( i \) to candidate \( j, \) initially all zeros. We then loop through the ballots one at a time. For each ballot \( B, \) each \( i \) in the range from \( 1 \) to \( C, \) and each \( j \) in the range from \( i+1 \) to \( C, \) we add one to the count for \( M[B[i],B[j]]. \) Finally, after computing this matrix, we can compare \( M[i,j] \) to \( M[j,i] \) as before to determine each pairwise winner, or subtract these two numbers to determine the margin of victory.

But when the number of voters is big (larger than \( C! \)) there's a different way to tally the votes that's more efficient. First, sort the ballots, so that all people who voted the same way are collected into the same group. (This can be done by treating each vote as a number in the factorial number system and applying counting sort to these numbers). Then, apply the \( O(C^2) \)-time method to the grouped ballots, looping over all groups rather than all individual ballots and changing the part that adds one to \( M[B[i],B[j]] \) so that instead it adds the size of a group of ballots. The running time is \( O(Cn) \) to number and sort the ballots, plus \( O(C^2 C!) \) to tally them. So we've reduced the dependence on \( n \) down to linear in \( C, \) at the expense of adding another term that is a much larger function of \( C. \) For systems like the Oscars or Hugos that have only five candidates and thousands of voters, this could be a win.

It's not possible to achieve a time of just \( O(Cn), \) without the extra term, because even when \( n \) is tiny the output size is \( C^2. \) But it is possible to trade off between the grouped and ungrouped tallying methods, when n is intermediate in size. To do so, group the candidates (arbitrarily) into blocks of \( B \) candidates (preferably a power of two; we'll pick the right size for \( B \) later). We can partition a voter's preferences into blocks in time \( O(Cn) \) by using bucket sort to partition the candidates into blocks in their preference order, and we can determine the voter's preferences between the candidates in the union of two blocks in time \( O(B) \) by applying a merge algorithm, comparing candidates using a reverse index of the positions of each candidate in the voter's preference list. There are \( O((C/B)^2) \) pairs of blocks, so combining the times for splitting votes into blocks and for applying the factorial method to each pair of blocks gives a total runtime of \( O(Cn+C^2n/B+C^2(2B)!). \) The right choice for \( B \) is the one which makes the second and last terms of this runtime approximately equal (\( B \) proportional to \( \log n/\log\log n \)) and this logarithmic factor is the amount by which the middle term of the time bound is faster than the "straightforward" \( O(C^2n) \)-time method.