Formatting a couple hundred references for a proposal led me to wonder: If you find yourself wanting to look up the BibTeX data for a paper, where do you go? And how much do you have to edit it yourself afterwards?

The three most obvious choices for me are DBLP, ACM Digital Library, or MathSciNet.

There used to be a project to maintain a collective file "geom.bib" with all the references that any computational geometer would ever use. I still have about 18 copies of it on my computer (presumably not all in sync with each other) from various papers that used it, but it became unwieldy (too big to use as one file) and seems to have fallen by the wayside. Additionally, many publishers supply citation files for their own publications, so you could use those, or even take the time to write your own. But my experience is that most of the publishers are not good at generating clean data (e.g. they use hyphens instead of en-dashes for page ranges, or permute conference title words into a different order than what you'd want to use in a citation), although at least they're better at it than Google scholar.

The big three above all have their quirks, but they generate pretty clean data (especially if you tell DBLP not to use crossref). Copying from them can be a lot easier and less error-prone than typing it all in yourself, and picking one source and sticking to it could also help achieve greater consistency. DBLP has the best coverage for Computer Science, I think. I recently looked at a five-year window of my papers (for the prior work section of that proposal) and it missed only three (two in non-computer science journals about topology and mathematical psychology, and the third in an edited volume about cellular automata).

My own idiosyncratic preference is for MathSciNet, though. Their coverage is almost as good for my purposes (sometimes better) but what ends up making the difference for me is their care about the capitalization of title words and formatting of math in titles. DBLP and ACM leave lots of words capitalized and let the bibtex style lowercase them later, which mostly works, but fails when some words are proper nouns that should stay capitalized. MathSciNet takes care to lowercase everything to how it should appear in a citation (my preference) and to protect the letters that should remain uppercase. And for titles that contain formulas, MathSciNet gets it right and the other two don't.

Example: ACM: "The h-Index of a Graph and Its Application to Dynamic Subgraph Statistics".
DBLP: "The h-Index of a Graph and Its Application to Dynamic Subgraph Statistics" (journal version); "The \emph{h}-Index of a Graph and Its Application to Dynamic Subgraph Statistics" (conference version).
MathSciNet: "The {$h$}-index of a graph and its application to dynamic subgraph statistics". One of these is correct and the others aren't.

But maybe there's some new tool or database that beats all of these that I haven't yet found out about. One of my co-authors uses Zotero, but I haven't tried that myself. Are systems like it based on shared libraries rather than comprehensive databases still useful?

(See also discussion on G+ from the same post.)





Comments:

ext_3000355:
2015-02-06T15:45:08Z
The crypto group at ENS maintains an incredibly useful and popular set of bib files for cryptography publications here: http://cryptobib.di.ens.fr/ They have kept it up to date for several years. Perhaps their template and management approach could be adapted to resurrect the geom.bib.
None:
2015-02-06T18:59:27Z
References are like sausages; it's best not to see how they're made.
ext_3000749: Google Scholar
2015-02-06T20:43:08Z
As another alternative, there's also BibTeX in Google Scholar: from the settings page enable "Show links to import citations into: [BibTeX]" and the search results will show a link to a plaintext page that you can easily copy and paste into papers, reference managers, etc. In the MacOS app BibDesk, you can simply hit "Command-V" at the application level and it'll process the contents of the clipboard into the bib file. (Similarly, hitting "Command-C" with an entry selected places " \cite{key}" in the clipboard. I've wondered how Google Scholar chooses the bibtex entry it shows though. My sense is that it uses some sort of majority vote algorithm for each entry across different variations it's seen the paper cited by. I know it has a way of updating itself when a previously-arxiv-only paper starts getting cited as a published paper. I do tend to do some light editing of the reference formatting before submission, but usually that's more about optimizing against conference page length constraints than anything else.
11011110: RE: Google Scholar
2015-02-06T21:24:36Z
If you curate your Google scholar profile carefully it can produce accurate BibTeX entries, but otherwise it can be quite patchy.