I read with interest Noam Nisan's post on bibliometrics. And I use bibliometrics all the time myself, to try to get a feel for which of my own papers are having an impact, to evaluate my colleagues, and in my frequent participation of Wikipedia's deletion discussions on articles about academics.[*]

But a few weeks ago I was discussing with a co-worker how easy it would be to game h-indexes calculated with Google Scholar, by setting up a large collection of randomly-generated papers seeded with citations to one's own papers. It's especially easy because Google Scholar doesn't filter out self-citations, but it would also be easy enough to make the citing papers have randomly-generated fake authors if that were an issue.

Apparently Ike Antkare has already done exactly that. It appears to be a proof-of-concept (and it appears that Ike Antkare is a pseudonym) rather than an actual dishonest researcher, but who knows how many dishonest researchers are out there trying to do the same thing a little more subtly, the same way we've heard stories about dishonest researchers already using high self-citations to influence the impact factors of the journals they edit?

So when we use bibliometrics, we also need to exercise some diligence to to ensure that the high numbers we're seeing are there for a good reason.

[*] Every time I mention this subject someone asks me: why delete any of those articles? Or, why would some famous person's article be included in these discussions? Because, too many grad students and starting assistant professors who haven't yet made a name for themselves think that they can improve their fame by putting an article about themselves into Wikipedia spinning themselves as the world's leading expert on some subject or other, and we don't want to mislead the readers. And because the people who go about finding such articles and proposing them for deletion sometimes find it hard to distinguish between an unsourced article claiming falsely that someone is a leading expert and an unsourced article on someone who really is a leading expert.





Comments:

ext_318517:
2011-02-22T18:29:12Z
Awesome. Once I stopped trying to pronounce the name as if it were Marathi I got the joke ("I can't care").
11011110:
2011-02-22T19:01:42Z
I must be slow today. I didn't get it until I saw your comment.
ayudug:
2011-02-22T18:50:52Z
Well, there are more down-to-earth methods. See http://arxiv.org/abs/1010.0278 Once bureaucrats start consider various citation indices they instantly become meaningless.
11011110:
2011-02-22T19:01:14Z
Thanks — that seems to be the same report on impact factor manipulation that I already linked to the phrase "influence the impact factors". I don't know about "instantly" and "meaningless" but I would agree that they quickly become much less reliable. A commenter on Nisan's post already pointed to Goodhart's law about this effect.
ayudug:
2011-02-22T19:24:26Z
I'm sorry, I didn't click all of your links.
ext_87671:
2011-02-22T20:14:49Z
Here's a more traditional example of impact factor manipulation, in this case impact ranking of a whole university was affected -- http://improbable.com/2010/11/15/yet-another-triumph-for-prof-el-naschie/
ext_439202: deletion discussions, "web of trust"
2011-02-23T01:16:39Z
Just because you mention deletion discussions: I think I noticed at some point that Ed Gilbert's Wikipedia article was deleted; he is certainly a quite notable information theorist, but is it possible that he fails the Wikipedia notability guide-lines? How can I tell, besides create another article for him? I imagine there is no standard that includes "their name is on an information-theoretic bound, and channel model, and they introduced the G(n,p) random graph model, and I'm sure did many other important things I know nothing about". Of slightly more general applicability: he's the Gilbert of the "Gilbert-Varshamov Bound", and there is a Wikipedia article about that bound. Does that make him notable? Re bibliometric factors, is anything short of some sort of analog of a "web of trust" adequate to determine that high numbers are there for a good reason?
11011110: Re: deletion discussions, "web of trust"
2011-02-23T01:42:40Z

There was an article on "Ed Gilbert", deleted in June 2007, that said only "Edgar Gilbert (born near 1950) is an engineer. He is currently living in Morristown, NJ. He is the author of the checkers engine KingsRow." Is that the one you mean? That's an example of what I mentioned earlier, someone truly noteworthy with an article written so badly that it's indistinguishable from one that should be deleted.

The usual standards for academics are at http://en.wikipedia.org/wiki/Wikipedia_talk:Notability_(academics) and the most commonly used criterion there involves somehow having a big impact in one's academic research. Doing two things as important as the Gilbert–Varshamov bound and the G(n,p) form of the Erdős–Rényi model model would almost certainly count. He also seems to have done important early work on Steiner trees. And even someone who knows nothing about information theory but who participates in the deletion discussions would be likely to see his big Google scholar citation counts and conclude that he's had a big impact.

The question, though, is what can be documented about Gilbert beyond the existence of those publications. If all we can say is that he existed and he wrote papers x, y, and z, it doesn't make for much of an article.

As for your more meta question about how we can filter the meaningful high numbers from the fakes: Google seems to be doing something reasonable in filtering spammer linkfarms from legitimate highly linked web sites, although one can always wish that they'd do even better. Maybe similar technology can be applied here? Unfortunately part of the trick appears to be not telling the spammers exactly what they're doing so that it's harder to game the system.

ext_439202: Re: deletion discussions, "web of trust"
2011-02-23T02:11:04Z

Ah, that can't be the Ed Gilbert I have in mind; if that's what I saw, then I was just assuming from his location that he was the information theorist (at the nearby Bell Labs). But "my" Ed Gilbert published research starting as early as the late fifties.

Doesn't your question, about what a Gilbert article could say, apply to anyone who is known only for their intellectual output? I'm not sure that Edgar Allen Poe, to pick another Edgar, did anything much of interest except write some reviews, poems, and stories. His life is interesting to the world at large only because of the interest and influence of his literary work. I don't see how his life story, as such, was *notable*.

11011110: Re: deletion discussions, "web of trust"
2011-02-23T02:36:21Z
Sure, the reason we have an article on Poe is because of his writing. But the content of the article talks about his life and career, and the influences in his works, as well as the works themselves and the influences they've had on others. I don't know that we need to have quite as much detail on Gilbert but it would be helpful to know some of the broad strokes of his professional career like where he was educated (is this him?) and where he worked after he graduated.
11011110: Re: deletion discussions, "web of trust"
2011-02-23T05:35:29Z
Hah! I finally found something useful. I think this (together with his obvious accomplishments) is enough for me to write an article.
11011110: Re: deletion discussions, "web of trust"
2011-02-23T06:54:43Z
Ok, here is a new article on Gilbert.