The Santa Cruz Sorting Hat

This is the week when high school students across the US finalize their decisions on which colleges and universities to go to in the following year. My son ended up choosing the University of California, Santa Cruz, where he wants to study computer science. Unlike faculty at many private schools, we get no special discount for sending our kids to the same university, but this is at least significantly less expensive than the other options he was choosing among, mainly similar-caliber state universities in other states. It’s also his mother’s alma mater.

Anyway, Santa Cruz is in certain respects similar to Hogwarts. The students at Hogwarts are divided up into four houses, with different house characteristics, different accommodations, and different course schedules, and it’s the same at Santa Cruz, only there are ten of them and they’re called colleges. They’re located on different parts of campus (some nestled in the redwoods, some with ocean views); the older ones have names and themes, while the two newest ones are merely numbered (“college nine” and “college ten”; the former “college eight” became Rachel Carson College last year). The students live there, at least for their first year, they take a college-specific set of freshman general-education courses, and they even graduate together. One of the things my son had to do before he sent his acceptance in was to rank the colleges: which ones did he want to be in, and which not? (I don’t actually know his ranking, but I imagine I’ll find out which one he ends up placed into.)

So it occurred to me to wonder: Since (I assume) Santa Cruz has no magic Sorting Hat that can simultaneously assign each student to a house that would best fit them and balance the sizes of all the houses, how do they do this placement? I don’t know but I can speculate.

Let’s formalize and simplify the problem by assuming that all incoming students provide a ranking of all colleges, that all colleges have an equal number of slots for new students, and that the colleges themselves do not have any preferences for some students over others except in that they would prefer to get students who want to be in those colleges. (In practice, the rankings are only partial, and I assume that the college sizes are not all exactly equal.) For most inputs of this sort, not all students can get their first choice (it’s not magic), so we need an algorithm to translate this input into a good assignment. But first, what do we mean by “good”? There are several natural criteria we can consider:

In general, we should try to respect the students’ preferences to the extent possible. A minimal requirement of this type is that the ranking should be stable, in that no group of students could all get better assignments by trading amongst each other. So if some students don’t get what they want, it’s not gratuitous: if they did get a better assignment, it would have to make some other student’s assignment worse.
The students should be incentivized to be honest, rather than gaming the system: reporting their rankings accurately should always get them a placement (or distribution of placements) at least as good as any other ranking. In particular students who make an unusual choice of ranking should not get an unusually high chance of a good assignment (incentivizing them to falsely report such a ranking) nor an unusually low chance (incentivizing the ones who would choose such a ranking to report something else).
The ranking algorithm’s use of its input should be meaningful. The input rankings are not utilities, and it would be a mistake to turn them into utilities by assigning numerical values to each rank. Harry Potter and Ron Weasley may have had the same rankings and ended up in the same house, but Harry’s “not Slytherin” reflects a very different utility function than Ron’s preference for Gryffindor.
Each student should have a fair chance of getting one of their highly-ranked choices. For instance, even if we ignored the preferences and assigned the students completely at random, each student would have a 1/#colleges chance of getting their first choice. Surely we shouldn’t do any worse than that.

The lack of a ranking from the colleges rules out stable marriage algorithms. The meaningfulness constraint implies that formulating and solving this as an instance of the assignment problem (the problem of maximizing some weighted combination of rankings) would also be problematic. An assignment problem formulation would be problematic in another way, too. Suppose that, out of 100 students, 99 of them rank Ravenclaw first, Gryffindor second, Hufflepuff third, and Slytherin fourth, while the 100th student swaps the ordering of Hufflepuff and Slytherin. Then a weighted matching algorithm will always map that 100th student to Slytherin, violating the honesty and fair chance constraints.

But there is a system that satisfies all of these constraints. It’s the lottery system that I remember from my own college campus-housing-assignment days: we choose a random permutation of the students, and then assign each student (in this random order) the highest-ranked available position. It’s stable, because in any group of students the earliest one in the permutation has an assignment that can’t be improved. It incentivizes honesty, because your ranking doesn’t affect the permutation of the students and then once it’s your turn to be assigned you always want to be assigned according to your actual preferences. It’s meaningful, because it only uses the rankings to compare alternative assignments, and doesn’t try to do arithmetic on them. And it’s fair: if there are \(H\) colleges to be assigned to, then for every integer \(i\) in the range from \(1\) to \(H,\) you will have probability at least \(i/H\) of getting one of your top \(i\) choices, because you have that probability of being one of the first \(in/H\) students in the permutation (out of \(n\) students altogether) and if you are, it’s not possible for your top \(i\) choices to all fill up before you get your assignment.

I suspect this is not the system Santa Cruz actually uses, though, because they only ask for your top five choices rather than a full ranking and it’s reasonably likely that the unlucky last student in the random permutation would be stuck with a lower-down choice than that. Maybe there are other systems with similar good properties that can also avoid assigning anyone to a really low-rank choice, when such an assignment is possible? (It’s not always possible: consider what happens when everyone chooses the same ranking.) Or maybe they usually get enough students who just don’t care and don’t even return a ranking that they can put those at the bottom of the permutation and give everyone else a better choice? I’d be interested in hearing from anyone who has some inside information on this process. But regardless of whether Santa Cruz actually uses it, the lottery looks like a pretty attractive option for this problem.

(Google+ discussion thread for this post)