I've been a bit suspicious of the impact factor and H-index for some time now. They're both easily measured numbers based on citation counts that serve as convenient substitutes for thought and knowledge in evaluation of academic quality — but I'd been kind of half-hearted in my suspicion, thinking that they were maybe ok for comparing people and journals within a single well-defined research area. Sure, some citations are different than others, I thought: the cite count might be low for a good paper that closes off an important research topic, and high for a bad paper that opens one, but it all evens out, right? And the impact factor uses a too-short window for the citations that measures buzz more than long-term impact, but buzz is still good, right? The only problem with these numbers is that they haven't been normalized for differing patterns of citations in different subjects, right? We all deplore beancounting but we also need to gather as many different kinds of information as we can for important decisions like hiring and promotions, so what can one more number hurt? And when we're making some less-important decisions, like whether certain theoretical computer scientists get to keep their Wikipedia articles, and we don't have access to the expert evaluations one would use for a real decision, we have to use what we can get, and we can get these numbers, so why shouldn't we use them?

Now Ben Webster, in regard to the sad case of M. El Naschie, points a finger at exactly what is wrong with these sorts of numeric scores. They may not be so bad (though they're surely not good) when everyone involved acts in good faith, but there's an observer effect: once one starts using them, some people of less-than-good faith are motivated to game the system. And that's what El Naschie seems to have done, racking up huge numbers of worthless journal publications in his self-edited journal, each of them contributing self-citations that inflated both his own H-index and his journal's impact factor to quite respectable levels.

A quick fix, you say? Just institute tighter editorial rules for the journals and don't count self-citations in the H-index? Both probably good ideas for other reasons, but also easily worked around. El Naschie doesn't seem to by shy about using sockpuppets online, so why not for his papers as well? And in any case, one can find other cases of small tightly-knit research communities repeatedly citing each others' papers — in many cases benign, but in others just logrolling. The same phenomenon happens even when the stakes are minimal to nonexistent (say, "favorite" counts for photos on Flickr) so why wouldn't it happen in academia?

No, the lesson is: numbers are no substitute for thought. In situations where it matters, such as tenure review, just say no to the H-index. By all means, look at which of the candidate's papers are highly cited, but do so with the care and expertise needed to understand why they're being cited, and what that implies about the impact of the candidate's work. And if the stakes are low (Wikipedia), but a decision still has to be made, still say no. If you lack the expertise to make the decision based on actual knowledge (and most Wikipedians do — I do, on many of the deletion discussions I participate in), don't try to fake it with numbers, look instead at more reliable indicators. I dislike credentialism almost as much as beancounting, but in this case it makes sense: if peers in an area (broadly enough construed to avoid the logrolling problem) consider something worthy of honor, it probably is. And if they don't, it may still be worthwhile, but in the absence of evidence we can't judge. The H-index and the impact factor are not evidence.