Via CACM: UT Southwestern's Deja Vu project has applied text pattern matching algorithms to all the titles and abstracts in Medline and found up to 200,000 duplicates. Many of them have benign explanations (e.g. the first one I saw was a Chinese-language paper and its English-language translation, by the same authors) but many others are likely plagiarism: the database currently lists 200 duplicate pairs of abstracts with no author in common. This is just in the abstracts; I imagine that additional similarities would be turned up in a full-text search.


Some care is still required. For example, the first of these "no author in common" duplicates is in fact an article plus a commentary on the article (which, I guess by editorial policy, is introduced with the same abstract.)

Good point. Their quick link to a more filtered subset of the no author in common duplicates works better.