Fast Schulze voting using quickselect
I have a new preprint, “Fast Schulze voting using quickselect” (arXiv:2411.18790), with two UCI students, Randy Huynh and Arushi Arora. Randy was an undergraduate when he worked on this with me, and Arushi is about to finish her master’s program. (There’s apparently another Arushi Arora studying computer science at Purdue; that’s someone else.)
The Schulze voting system involves collecting preference orderings from voters, for multiple candidates in an election, and then combining them to determine who wins, similarly to many other systems such as the more familiar instant runoff method. The exact format of the preference ordering doesn’t matter as long as you can tell which of two candidates each voter prefers, or whether they have no preference between some pairs of candidates. In the Schulze method, we use the collected data to simulate the outcome of each head-to-head competition between each pair of candidates. We then construct a directed graph, with an edge from the winner to the loser of each head-to-head competition, weighted by the margin of victory for that competition. A directed path in this graph is called a beatpath, and its strength is measured by its weakest link: the edge with the smallest weight. Finding the strongest beatpath is an example of the widest path problem. One candidate X beats another candidate Y, in the Schulze method, when the strongest beatpath from X to Y is stronger than the strongest beatpath from Y to X. There is always an unbeaten candidate, and this candidate is chosen as the overall winner. There can sometimes be multiple unbeaten candidates, but only if some margins of victory are equal, unlikely when there are many voters.
This method has some nice theoretical properties. In particular it is a Condorcet method: if one candidate wins all head-to-head competitions, that candidate wins the overall election. I like it for the fact that you get a winner from a simple mathematical property (the unique existence of a winner for any non-tied election) rather than from ad-hoc rules, and I always enjoy teaching about it as an application of path-finding algorithms in my graph algorithms class (for which Arushi was a teaching assistant last year). But I have to admit that it is problematic from the point of view of being able to explain to a non-technical audience how it works and why it chose a certain outcome. It is used for some public elections, but more commonly by computer-related technical societies.
So anyway, if you want to use the Schulze system, you need to know the strongest beatpaths between all pairs of candidates. Or do you? The reference implementation of the Schulze system uses the Floyd–Warshall all-pairs shortest paths algorithm, modified for widest paths. This takes \(O(n^3)\) time, and you can speed this up using a different all-pairs widest paths algorithm to time \(O(n^{(3+\omega)/2})\) where \(\omega\) is the exponent for fast matrix multiplication. But in a paper at EC 2021, Krzysztof Sornat, Virginia Vassilevska Williams, and Yinzhan Xu showed that all-pairs widest paths are not necessary. Instead they show how to go from an \(n\times n\) matrix (or graph) of head-to-head margins of victory to the Schulze winner in randomized expected time \(O(n^2\log^4 n)\), by deleting edges from the beatpath graph in ascending order by weight and applying a (randomized) dynamic graph algorithm for decremental strong connectivity.
The main result of the new paper is to greatly simplify this calculation, and speed it up to \(O(n^2\log n)\). Here’s the algorithm: maintain a set of un-eliminated candidates, initially all of them. Repeatedly choose a random “pivot” candidate, and compute single-source paths to and from the pivot (in the full beatpath graph) to compare the beatpath strengths between the pivot and everyone else. If nobody beats the pivot, you have found the election winner; otherwise, eliminate everyone who does not beat the pivot. You can use Dijkstra’s algorithm to compute the single-source paths in each round; in dense graphs, with no priority queue, Dijkstra takes time \(O(n^2)\). It takes logarithmically many rounds to eliminate all but one candidate (in expectation or with high probability), with two calls to Dijkstra in each round, so the total time is \(O(n^2\log n)\).
This algorithm is, essentially, the quickselect algorithm, applied to the special case of finding the maximum of a set of items. It turns out to work just as well for finding the maximum in a partial order with a unique maximum item, which describes what you get when you order candidates by beatpath strength in the Schulze method. Going into this research I had the incorrect idea that beatpath strength would give a linear ordering of the candidates, and that this supposed linear ordering might lead to even faster or more general algorithms, but (as we also show in the new paper) that is far from the case. When there are no equal margins of victory, the only thing you can say about the ordering is that there is one candidate who beats all others, and one candidate who is beaten by all others. Among the remaining candidates, all partial orders are possible.
If there are some equal margins of victory, there may be multiple unbeaten candidates according to the Schulze method. Schulze suggests a tie-breaking rule that I don’t like because it’s an ad hoc rule that doesn’t relate to the rest of the method. Instead, you could randomly perturb the margins of victory in order to make the winner unique before applying the Schulze method (keeping the perturbations less than one voter to avoid throwing an untied election to the wrong candidate). Or, a modification of the same quickselect algorithm can list all unbeaten candidates, in \(O(n^2\log n)\) time per candidate.
Is this new algorithm helpful in real elections? Not really. As Sornat, Vassilevska Williams, and Xu also show (assuming the strong exponential time hypothesis), you will always spend more time collating voter preferences into margins of victory than you will finding the winner from the matrix or graph of margins of victory. This lower bound applies even if you try to shortcut this process by somehow going directly from votes to winners. And anyway, in most real elections the number of candidates is so small that even Floyd–Warshall will run instantly, making faster algorithms unnecessary. Still, this new algorithm could be useful in running computational experiments on synthetic data, where you could plausibly generate many large matrices without collating them from votes.