Google scholar citations

Google's new scholar citation system has gone public, and I've signed up and made a public profile for myself.

As far as I can tell the main use for this is similar to DBLP's author profiles: a way of finding out a quick summary of what someone else has been doing lately, or of finding a paper whose author you know. Unlike DBLP, it's augmented with citation counts and h-indexes, but perhaps more importantly it includes all publications that Google knows about not just those in DBLP-indexed venues. It also allows its authors to edit the publication data for their own publications, and if some of that cleanup can trickle back to the main part of Google scholar that would be a very good thing.

Like DBLP, it includes a way of linking authors to their frequent co-authors, but that feature is sort of useless right now, at least for me, because most or all of my co-authors don't have profiles.

Another piece of control Google gives to authors is whether to keep journal and conference versions of papers separate or to merge them into a single combined entry. On the face of it this would seem like an easy way to manipulate the h-index it shows, but in my experience (at least, trying to choose mergers honestly to clean up the profile rather than merging unrelated papers) splitting vs merging makes almost no difference. If one really wanted to manipulate these scores I think a more effective way would be to find the papers whose citation count is close to but below threshold and drum up more citations for them. And if Google were serious about making the h-index harder to manipulate and more meaningful, they'd eliminate self-citations from their counts, something that should be easier to do now that authors are treated as first-class entities in their data.

ETA: Did you know that Alan Turing's h-index is only 23? That should tell you something about the value of such measures. Also, a group of people named Peter Taylor have found a clever way of scamming the system: claim all publications by similarly-named people as theirs.

Comments:

ext_886308: API?
2011-11-17T20:13:14Z
Cool! Do they have some API so you can use the data in the context of other applications, e.g. your homepage?

11011110: Re: API?
2011-11-17T21:04:18Z
Good question, but I don't have any idea what the answer is. Their FAQ doesn't seem to say anything about an API.

bibliometrics: Self-citations
2011-11-18T05:26:15Z

Removing self-citations is tantamount to saying that someone else doing followup work and citing your paper has value, while you doing the exact same work and citing your own paper has none.

While external citations surely have a bit more value than quoting your own work, removing them altogether is not the answer either.

11011110: Re: Self-citations
2011-11-18T06:06:18Z
Citations are not the same thing as value. They are closer to (though still not the same as) impact — is your work having any effect on others? But self-citations don't measure that, and instead mostly measure how much your present work resembles your past work. Also they are even easier to game than all the rest of these numbers.

bibliometrics: Re: Self-citations
2011-11-18T06:32:51Z

But self-citations don't measure that, and instead mostly measure how much your present work resembles your past work.

I disagree with the "mostly" part. The measure is sensitive self-similarity of research, but do you have any evidence to indicate that this is the main signal coming out of the aggregate citation count?

I would claim that, to the contrary, so long as you restrict the count to serious refereed venues, aggregate citation count mostly reflects perceived relevance by the community, as otherwise your self-citing work would not have been accepted for publication in the first place.

Also they are even easier to game than all the rest of these numbers.

Surely the place to fix that is during the editorial process, not the Google counting personal page.

ext_887241: Re: Self-citations
2011-11-18T13:09:31Z

Surely the place to fix that is during the editorial process, not the Google counting personal page.

I agree completely. There is absolutely no excuse for editors not to insist that authors retroactively remove self-citations from all their published papers, even if they have to return from the grave to do so.

bibliometrics: Re: Self-citations
2011-11-18T21:03:59Z

Either the self-citations were warranted during the editorial process or they were not. The error occurred then and there, no need to do voodoo revisions.

Furthermore your proposed "solution" is to punish anyone who works on a continuous line of research because of editorial errors regarding people who tried to game the system.

If someone had suggested removing self-citations today for the first time, people would be all over it as a dumb idea. Problem is it has been around for ages and people stop questioning even the dumbest parts of the status quo.

Just look at how people criticize the admittedly flawed h-index, without realizing that many of those criticisms apply to final exams, course grades and even the PhD thesis process. That is why people don't bitch bitterly about the fact that the PhD process can be gamed, but give blog space to the ridiculous Peter Taylor attack on the h-index, which as far as attacks go it one of the weakest I've seen. Any person would see through it in half a second.

11011110: Re: Self-citations
2011-11-18T21:51:29Z

I don't think that creating self-citations is an error or an abuse; there are lots of valid reasons to self-cite. I merely think that they represent different information than the other kind of citation and that it would be more helpful to count them separately.

And yes, course grades are flawed. The global distribution of wealth is flawed. The baggage retrieval system they've got at Heathrow is flawed. Nevertheless, that shouldn't prevent us from recognizing or pointing out flaws in bibliometric measures. If you think that blog space shouldn't be wasted on such things, get your own blog and avoid wasting space on it.

bibliometrics: Re: Self-citations
2011-11-18T22:08:24Z

You misunderstood my point. The h-index clearly has flaws and pointing them out is worthy of discussion. I never intended to suggest otherwise. However this is not to say that all criticisms are of equal value.

For example, you noted that Alan Turing has an h-index of 23 deftly highlights one of the weakest points of the h-index and I took no issue with that.

I stand by my point that the Peter Taylor gaming of the h-index is unsophisticated and proves nothing.

ext_887173: Merge conference and journal versions?
2011-11-18T12:25:29Z

I am wondering whether one should merge conference and journal versions of papers. Disregarding for a moment how this could alter citation metrics, what is the right thing to do here? Is there any accepted view on this?

On one hand the journal version of a conference paper represents the same work, or a superset. On the other hand, it is a different publication. Do you merge technical reports with a conference paper? with a journal paper?

I assume that there's probably more than one acceptable view on this, as long as one is consistent.

Serge

11011110: Re: Merge conference and journal versions?
2011-11-18T15:37:57Z
I can't speak to any consensus of what people should be doing, but I can at least describe what I have been doing myself. I've been merging them in cases where the conference and journal papers coorrespond rather than splutting or joining papers from one version to the other, and where the versions have roughly similar content rather than having new co-authors etc. Sometimes this causes the profile to mark the merger with an asterisk, sometimes it doesn't. And sometimes I can't find separate conference and journal versions to merge in the profile, even when they exist as papers, because Google scholar only has a single entry for both versions.

bibliometrics: Re: Merge conference and journal versions?
2011-11-18T21:19:52Z

I couldn't make my mind either, again without even looking at citation count impacts. This is part of a bigger issue. People in the past noticed that

Paper on topic A + Paper on topic B

is more work than

Conference paper A + Journal version of A

Yet on straight paper counts both would add up to two. This is a real problem, which was inaptly named "double counting", as if producing a journal version didn't imply more work than just a conference version.

Some people proposed counting both versions of the paper as one, which only results in the equally absurd situation of equating

Conference paper A + Journal version of A

with

Conference paper A

To this date, I find both situations unsatisfactory, but I prefer the first one as at least it has some incentives for researchers to produce journal versions of conference papers, something we are all guilty of not always doing promptly enough.