What acceptance rates should our conferences be aiming for, and what are the affects of those choices in the balance of papers that get into the conferences?

Mike Goodrich had an interesting recent post on Google plus arguing that, for reasons of scalability, academic conferences should aim for a stable year-to-year acceptance rate rather than (as seems more typical now) aiming for a stable year-to-year absolute number of accepted papers. In the comments to Mike's post, I raised a related issue, which is that (in my opinion) reduced acceptance rates lead to a greater emphasis on fashion and conventionality and a reduced fraction of speculative papers at the conference. I gave some reasons why I think that happens in the comment, but I thought I'd try to quantify that a little more precisely here.

In order to model conference acceptance rates and their affects on the balance of topics in the conference program, I made some basic assumptions: that papers have some sort of meaningful level of strength (else, why would we go to all this trouble to accept some and reject others?), that being on a fashionable or conventional topic is better for a paper's acceptance chances than being on a speculative topic (see my comment on Mike's post for why), and that there is some level of randomness, arbitrariness, or other factors in acceptance not quantified by these two numbers. I also assumed that a paper's strength is significantly more important than the other two factors for whether it gets in; I think most would agree that this should be true, but I also think it actually is true. I ignored the effect that better conferences attract better submissions.

To turn this into a mathematical model, I assigned points to papers (much as easychair actually does): 0 to 4 points for strength (very weak, weak, mediocre, strong, very strong); 0 to 2 points for topic (speculative, conventional, or fashionable), and 0 to 2 points for the other random factors that might also affect a paper's chances. I also assumed (in the absence of a reason for any other choice) that submissions are equally distributed among the 45 different triples of scores that this system defines. The fact that strength is measured on a larger scale means that it has a larger influence on the total score (the sum of these three numbers). I then used a simple thresholding scheme, where papers with a score higher than a given threshold are accepted, papers with a score lower than the threshold are rejected, and some fraction of the papers at the threshold are kept in order to meet the target acceptance ratio. You can easily change some of the parameters of the model but I don't think it will make a lot of difference to the conclusions that can be drawn from it. However, I should emphasize that this model is purely theoretical, constructed from some experience but no data.

In this model, for an acceptance rate of 65% (such as is typical for IPEC, a workshop whose PC I'm on) all of the very strong submissions are accepted, and almost all of the strong submissions are — only 1/4 of the strong speculative papers are rejected. But the price for this is a high acceptance rate for very weak fashionable papers, nearly 42%. Even at this level, the penalty associated with being a speculative paper means that, although 33% of the submissions are speculative, only 23% of the acceptances are speculative.

For an acceptance rate of 50% (typical for Graph Drawing and I think other second-tier or specialized conferences) a small number of very strong but speculative papers are already starting to be rejected, and only 50% of the strong speculative papers are accepted. Additionally, the number of very weak papers that are accepted is acceptably low (17% of the fashionable ones and none of the others), but 50% of the weak fashionable papers still get in. 20% of the acceptances are speculative.

For an acceptance rate of 35% (as for SODA on a good year for authors), almost all of the strong or very strong fashionable papers are accepted, but only 57% of the very strong speculative papers and only 24% of the strong speculative papers get in. Still, this level keeps out most of the weak papers; only 24% of the fashionable weak ones get in, and all of the very weak submissions and weak submissions on unfashionable topics are rejected. 15% of the acceptances are speculative.

For an acceptance rate of 25% (for SODA on a year with a high submission rate and a constant absolute number of acceptances, such as this year is shaping up to be) all of the very strong fashionable papers are still in, as are most of the strong ones. But most of the very strong speculative papers are rejected — only 38% are accepted — and almost all of the strong speculative papers are out. Almost all of the weak submissions are rejected, but still 38% of the mediocre fashionable papers can get in. Only 11% of the acceptances are speculative.

Cutting even farther back to 10% (as I have heard is the acceptance ratio for some systems conferences), 70% of the very strong fashionable papers still get in, but almost none of even the very strong speculative papers do. The fraction of the accepted papers that are speculative is around 2%.

What does this all mean? Well, maybe not much. But if you believe that something resembling this model is accurate (a very big if), and you want to aim for a conference where most or all of the very strong submissions are accepted, then a 50% acceptance rate is a good target to aim for. If you want to prevent weak papers being accepted by cutting the acceptance rates to smaller numbers such as 35% or 25%, then the price you have to pay for your tough stance is that you are also going to be rejecting very strong papers in doing so, and that you are tilting your conference significantly towards the side of conventionality.

I think more importantly, this all stands in sharp contradiction to a view I've seen expressed elsewhere, that it doesn't really matter where your acceptance rate is in the range from 20% to 40%, because you will still accept all the good papers that way and reject all the really bad papers, so that all you're doing when you change the numbers is to change how many of the indistinguishable mediocre papers in the middle you accept. In this model, on the contrary, the difference between 25% and 35% makes a big difference in the acceptance rates of the strong and very strong speculative papers, and it also makes a similar difference in the acceptance rates of the weak and mediocre but fashionable papers. Which of 35% or 25% we should aim for depends on whether we are trying to include all the good research and encourage speculative research (35%), or whether we are trying to absolutely bar any weak research from sneaking in (25%), but we can't have both.

The source code for these simulations is online here.



One practical aspect of Mike's proposal of aiming for a stable acceptance rate is that the absolute acceptance numbers are necessarily set (to at least a fair approximation) well before the abstract submission deadline. In particular, they're set by the hotel contract, which includes a fixed number of conference rooms for a fixed amount of time, and that has to be in place by the point the conference details are announced -- and that customarily precedes the call for papers. Also, for large events, you need on the order of a year's notice for the contract anyway.

Beyond that, the hotel contract will be based on a guaranteed number of room nights, which is based on the number of expected attendees, which is pretty closely related to the number of presenters (at least at conferences I've attended).

(You could, I suppose, consider the length of a presentation to be a free variable, and adjust that to cover late variations in the number of accepted presentations -- but, unless you're happy with potentially going with APS-style 10-minute lighting presentations or 1-hour long lectures, you don't get a lot of flexibility from that. And there's still the room-nights issue.)

So, really, in practice about the best you can do is try to adjust next year's absolute paper count based on the number of papers submitted last year, and maybe do some small adjustments this year.


Well, that's all true. But I think PC chairs typically have around 5% leeway in what they can accept. So I guess what "aim for a stable acceptance rate" really means is both what you say, to adjust for growth in following years, but also to use that leeway to make the acceptance rate as close as possible to the target within what can practically be done.


I'd agree with that. (And I've also seen the occasional conference that shrank by a day between the call for papers and the acceptance notices, due to low paper submissions; not something you'd want to do often, but not impossible.)


Back to the mathematical model in this post....

Stripping out all of the mathematics, it seems to me that the critical point of the model is that you're ranking the acceptance priority of a paper in such a way that (without random factors) a fashionable mediocre paper is as high priority as a speculative very strong paper, or a fashionable weak paper as high as a speculative strong paper -- and then there are random factors enough to make up the entire rest of the scale in extreme cases. (For example, the top third of the fashionable very weak papers are as high as the bottom third of the speculative very strong ones.)

The key result is that, when you do a scale with that much variation beyond the strength level, the "indistinguished mediocre middle" of the priority scale is in fact not indistinguishable, but is a fairly heterogeneous mix of low-fashion high-strength and low-strength high-fashion papers.

And the conclusion is that if you're using a priority system fitting the model, you get to choose how much of that heterogeneous mix to include, but in any case there's a lot of it and it's pretty heterogeneous, including a wide range of strength at any priority level.

That seems pretty obviously true, but obvious to the extent that it seems quite directly tied to your starting assumptions, and the level of variation you added between the "how we determine priority" and "how we determine if we got the good stuff" scales, so I think its validity is pretty tightly tied to the validity of those assumptions.

One further factor that increases the variation, IMO, is that there's an additional random factor between the objective strength of the paper (insofar as we claim that exists as an input to the priority calculation) and the strength that a conference attendee believes the paper has, and the thing you really want to optimize for is the distribution of views of strength of accepted papers amongst conference attendees. And there's also variation between the paper you accept and the presentation given about it. If you want everyone to agree that all the strong papers got accepted, that's a rather different level than getting everyone to agree that no non-strong papers got accepted, even if you are accepting only based on strength.


I think there's a difference between making something be an assumption in the model and having it be a mathematical consequence of the assumptions. For instance, the equivalence in the model between "very strong speculative" and "mediocre fashionable" was not in my initial intentions, and would not be true if I made minor changes to the model such as using six gradations of strength instead of five. So the assumptions were that factors other than strength are important and can be modeled by a simplistic point system; the consequence of those assumptions are that the system is unable to distinguish between two types of paper that one probably would like it to be able to distinguish between.

As for the fact that there is not really a single linear scale of strength that everyone agrees on: of course, but that only worsens the inability to simultaneously accept all good papers and reject all bad ones.


Right, there's sort of a fuzzy boundary between what's assumptions and what's immediate consequences, IMO. I'd count the particular scales of the simplistic point systems also as assumptions, and then the immediate consequence is that these two things have the same point value. Of course, the fact that I'm saying it's obvious to the point of triviality as a conclusion may be one of these things where it's only obvious once pointed out.

And, yes -- I meant to say that the additional variation only worsened the situation. More generally, I strongly agree with your overall conclusion, even if I'm quibbling on the details of the argument for it.

11011110: Comment from Paul Beame

Paul asked me to post this, since he doesn't have an LJ account and I've restricted anonymous comments:

To turn this into a mathematical model, I assigned points to papers (much as easychair actually does): 0 to 4 points for strength (very weak, weak, mediocre, strong, very strong); 0 to 2 points for topic (speculative, conventional, or fashionable), and 0 to 2 points for the other random factors that might also affect a paper's chances. I also assumed (in the absence of a reason for any other choice) that submissions are equally distributed among the 45 different triples of scores that this system defines.

Of all the things you chose to model, the part that seems the most counter to experience is the equidistribution of papers in different categories. There are more likely to be many more submissions on conventional and fashionable topics, and many more papers in the middle on strength (especially mediocre and strong). The fraction of submissions at the low end of strength will vary based on conference reputation. The fraction at the very high end is usually much less then 20%, and more like 10%.