Permanent similarity

While poking around some stuff about permanents for this Wikipedia deletion debate, I found:

Jerrum and Sinclair, SICOMP 1989.
Rasmussen, Random Structures and Algorithms 1992 (I don't have subscription access, so I used this 1992 tech rep. version.)
Frieze and Jerrum, Combinatorica 1995 (substantially similar to a 1991 tech report single-authored by Jerrum).
Jerrum and Vazirani, Algorithmica 1996.

All four start with one or two similarly-phrased sentences defining a matrix A and its permanent using very similar formulas: (in three of the papers) an inline one of the form \( A = \{ a_{i,j}: 0 \le i,j \le n-1 \} \) and a displayed formula \[ \mathrm{per}(A) = \sum_{\pi} \prod_{i=0}^{n-1} a_{i,\pi(i)}, \] with Rasmussen using 1-based indexing and the three Jerrum papers using 0-based indexing; the SICOMP paper substitutes \( \sum \) for \( \prod \). In the two later Jerrum papers these two sentences are identical, and the Rasmussen paper has similar phrasing: (first Jerrum paper) “The permanent of an \( n \times n \) matrix \( A \) with 0-1 entries \( a_{i,j} \) is defined by (displayed formula) where the sum is over all permutations \( \sigma \) of \( [n] = \{0,\dots,n-1\} \).” (two later Jerrum papers) “The permanent of an \( n \times n \) matrix (formula) is defined by (displayed formula) where the sum is over all permutations \( \pi \) of \( [n] = \{0,\dots,n-1\} \).” vs (Rasmussen) “Let (formula) be an \( n \times n \) (0-1) matrix. The permanent of \( A \), \( \mathrm{per}(A) \), is defined by (displayed formula) where the sum is over all permutations \( \pi \) of \( [n] = \{1,...,n\} \).”

All four in the next sentences describe the application to counting matchings in bipartite graphs. Again, the second two Jerrum papers use the identical sentences, while the other two diverge a little more here. All four then go on to mention the existence of other applications and past study; Jerrum/Sinclair, Rasmussen, and Frieze/Jerrum both use the exact phrase “has been an object of study by mathematicians since first appearing in 1812 in the work of Cauchy and Binet.” while the wording is similar but not identical in Jerrum/Vazirani. All four then cite a survey by Minc.

Next, all four mention past failures to find efficient algorithms for the permanent. The three Jerrum papers write “Despite considerable effort, and in contrast with the syntactically very similar determinant, no efficient procedure for computing the permanent is known.” Rasmussen uses very similar wording: “Despite this effort and its syntactical similarity with the determinant, no efficient procedure for computing this function is known.” Rasmussen then

After this, all four cite Valiant's proof of hardness of computing the permanent (the result discussed in the AfD). Rasmussen: “In the late 1970's, Valiant provided convincing evidence for the inherent intractability of the permanent, by demonstrating the problem to be complete for the class #P of enumeration problems. It is thus as hard to compute the permanent as it is, say, to count the number of satisfying assignments to a CNF formula.” Frieze/Jerrum: “Convincing evidence for the inherent intractability of the permanent was provided in the late 1970s by Valiant, who demonstrated that it is complete for the class #P of enumeration problems, and thus as hard as counting the number of satisfying assignments to a CNF formula.” Jerrum/Sinclair is the same up to the final noun phrase, which is replaced by “any NP-structures.” Jerrum/Vazirani: “Valiant provided convincing evidence for the inherent intractability of the permanent, by demonstrating that it is complete for the class #P of enumeration problems, and thus as hard as counting the number of satisfying assignments to a CNF formula.” Jerrum and Vazirani then go on to cite the algorithm of Ryser.

At this point the Jerrum/Vazirani paper diverges from the other three. However, the other three then have a substantially similar paragraph defining randomized approximation schemes and fully polynomial time approximation schemes and describing the idea of using medians of multiple runs to boost the success probability of these schemes. Rasmussen then interpolates a paragraph about unbiased estimators, and then Rasmussen and Frieze/Jerrum again have substantially similar paragraphs stating that it's unknown whether there is a fpras for permanent and describing a weakening of the problem to one in which the algorithm need only work for most inputs.

I don't think this is a clear-cut case of plagiarism, in the classic sense of taking someone else's words without their knowledge or permission: three of the papers share an author, and the fourth is by someone at the same institution. But is this level of textual similarity in the introductory material of distinct published journal papers considered acceptable? What about (as in one of these cases) when the sets of authors do not overlap?

Comments:

brooksmoses:
2008-12-19T00:53:54Z
I find the case where the sets of authors do not overlap to be somewhat problematic, personally, yes. The problem is not something I would consider plagiarism in the full sense; as you say, it is presumably something done with a reasonable degree of assent by the original authors of the material -- and, being introductory material, it doesn't create a situation where someone is claiming results that they were not responsible for. However, I would feel that some degree of acknowledgement would be useful -- perhaps a footnote or endnote to the effect of, "The author acknowledges the contributions of Other Author to the introductory material," or something along those lines. This is especially true of text is copied verbatim; less so if what is copied is merely structure and content -- but, even in the latter case, there should certainly at least be a clear citation of that content to the previous paper. In cases where the authors do overlap, it generally isn't something I'd find problematic -- the acknowledgement is redundant (unless there's a specific reason to acknowledge a coauthor of the previous paper who does not appear on the current paper). And in things such as the introductory material where there is generally no scholarly benefit to having them differ, it's not something that causes detriments for the reader. However, there is a separate matter of copyright assignment; if one has published the first article in a journal that requires an assignment of copyright (as most do), then it may be a violation of that copyright to reprint the material in a new article. Personally, as neither the author nor either of the copyright holders, I don't consider that my business to worry about, but it does seem part of a trend among scholarly authors to treat copyright assignments rather more frivolously than I think is appropriate.

None: Hmmm
2008-12-19T01:12:40Z
The problem is that it is not acceptable to have an introduction that says - for background see [XXX]. Instead, you are suppose to repeat what to people working in this subsubsubsubsub field might be by now standard and obvious. It is natural to "copy" then from previous articles, since they already figured out how to present it in the most natural way. Introductions are a bit of a space filler and I would not worry too much about similarity with other papers in the field, as long as the results are original. The main negative here is that the authors did not bother to optimiza their introduction to their results. They probably described things they should have not bothered with, or missed relevant things that should be included. I guess the acceptable way to do that, would be to have section of background, and start it with a sentence like: "the following is by now standard \cite[relevant papers}, and is included here only for the sake of completeness.

11011110: Re: Hmmm
2008-12-19T01:41:26Z
It might have been acceptable to preface the introductions of the later papers with a sentence like “We follow Jerrum and Sinclair [] in outlining the history of permanents and their approximation.” That's the approach my co-authors and I took in incorporating some material from past papers into our Media Theory book. But I think of that as a different type of publication, just like I think of a journal revision as being different from a conference extended abstract or technical report. It's accepted and even encouraged to re-use material from one of these types of publications in a different type of publication (with some timing constraints: no conference publication of material that has already appeared in a journal). But I think of journal articles (and, separately, peer-reviewed conference proceedings papers) as obeying something more like the Pauli exclusion principle: each should be distinct from each other.

None: Re: Hmmm
2008-12-19T02:23:47Z
This is something I've been wondering about recently as well. And I have to admit that I'm much more of a stickler about these things than I imagine is average. But it seems like no-matter the copyright issues, it's self-plagiarism at the least (except in the case of turning a conference paper into a journal paper). On the other hand, I don't see a good way around this, since there's only so many ways you can rephrase introductory material before it gets ridiculous. - Sorelle

None: Re: Hmmm
2008-12-19T16:33:00Z
We use this approach as well. In certain situations one wants to lift the entire definitions and notation that have been very nicely set up elsewhere (and there is absolutely no value in duplicating) so we say "we follow the set up of XYZ [] which we include here for completeness". Lifting the introduction or motivation from somebody else is a different thing. I think part of the problem is that we are using a single word (plagiarism) to describe a variety of situations. - In its worst case is verbatim, unacknowledged, malicious copying of other people's work. - In non-fiction writing there have been examples of careless quoting, where a sentence made it into personal notes of the author and later on the author forgot that this was source material rather than his/her own penned thoughts. Still not allowed, but different from above. - Quoting of definitions and notation, which IMHO should be perfectly legal provided that it is fully credited. Something like "Definition[Fellows et al, pp.34] An FPTA is ..." - Self-quoting of results (unacknowledged): not allowed. - Self-quoting of results (acknowledged): make it as brief as possible, as one would quote another author's result. - Self-quoting of introduction and motivation: generally ok, provided that in each instance changes are made to reflect the peculiarities of the paper (interestingly the ACM survey reported people being evenly split for and against this one). - Quoting of other people's introduction and motivation: not ok. If a paragraph is particularly inspired, blockquote it with full citation. - Compilations and compendia. This is an established, very useful form that is in danger of being swept over in the plagiarism debate. It consists of reading, digesting and distilling the best ideas from a variety of sources. Verbatim unattributed quoting is still not allowed, but substantial similarities may peek through. Sources should be acknowledged in footnotes. The author attribution should be understood as a "summarized and compiled by" rather than "written by" and the title should acknowledge this fact with "Cookie cutting theory: a compendium" or "A survey of cookie cutting theory". - Textbooks. Most results in a textbook appear without attribution. This is ok because there is no implicit claim of research authorship. Presentation might follow that of a paper on the area, but again verbatim quoting is not allowed unless permission is sought and fully attributed in the notes at the end of the chapter: the proof of theorem 4.5 is that of XYZ (save minor changes), we thank the authors for allowing us to include it in this text. - Industry Research/Executive reports (not for circulation/publication). These are extreme cases. Pretty much everything in them comes from an external source, timeliness is an important consideration and no claim of originality is being made. Unattributed, direct lifting is allowed. Bibliography might be scant or altogether non-existent. The point is to get information across, not to track academic lineage. Authorship is understood to mean "every relevant and important fact that I could find on subject X". - Speeches. Lately we have seen people trying to apply academic citation standards to speeches. This is silly. A speech is collected wisdom, not unlike the internal report above. In the case of a politician it signifies "this is what I believe in" not "this is what I personally discovered". Standing up and reading an entire text from someone is not allowed, but a sentence from here and a sentence from there is ok. E.g. Obama's "A more perfect union", no one would expect "A more perfect union [Jefferson et al., 1787 Philadelphia Free Press]".

t_mobile_whore:
2008-12-19T02:55:10Z
But is this level of textual similarity in the introductory material of distinct published journal papers considered acceptable? Why in the world not? Is anyone harmed by this similarity? If so, who is it? Other disciplines have their own plagiarism codes which do not apply here. In history, for example, it may be difficult to separate routine introductory material from original research, so it makes sense to say that similarities such as the one you've pointed out are unacceptable. In the context of math or computer science, I see no reason why we should care about "plagiarism" not involving original research results.

None:
2008-12-20T05:03:40Z
There would be no problem if people would just admit it in the paper, as suggested above. However, my guess is that most people who do this would be a little embarrassed to admit it: they would rather share in the (small) credit for the nice introduction than be publicly identified as being too lazy to write their own. Their dishonesty doesn't hurt anyone very much, but I do think it causes a little harm to the community.

None: similarity?
2008-12-19T05:39:40Z
This is silly. Plagiarism is about essence. As opposed to literature, where the exact use of language is the essence, in science a standard recounting of the background is not the essence. Re-using parts of your introductory material is really not very different than re-using your LaTeX macros.

11011110: Re: similarity?
2008-12-19T05:47:54Z
It's more like reusing the entire introduction. And in one case it wasn't the author's own introductory material that was re-used.

None: Re: similarity?
2008-12-19T06:04:36Z
Plus, the original author did actually have to write it. It's not right for future authors to get away with just copying it. Especially since the paper probably wouldn't be accepted if it had no introduction... - Sorelle

None: Re: similarity?
2008-12-19T10:27:28Z
Surely the important thing in the introduction to a paper containing a mathematical result is to explain things as clearly as possible and the place where originality is important is in the actual result itself. The original paper (Jerrum and Sincalir 1989) is a paragon of mathematical writing, so it is understandable that their introduction to the problem is re-used in future publications.

None: Self-plagiarism?
2008-12-20T14:23:38Z
Just so we're clear, Lars Rasmussen was Alistair Sinclair's student, and Alistair would have seen (and advised and commented on) the specific drafts you're seeing. In this sense, it's not clear that Rasmussen is a "distinct author" from Jerrum/Sinclair in the sense you wish to say. That is, Rasmussen may have had permission to use the same essential wording. I, personally, think it's fine for authors to loosely self-plagiarize in the introductory material -- although I do admit it's somewhat better if they give a reference to their first description in some way. Let's say Jerrum and Sinclair spent several hours optimizing the introduction/definitions for their first paper. Why should they then spend several additional hours setting up a different version with the same essential information just to satisfy some arbitrary notion of "differentness" that has nothing to do with the actual content of the paper? So I find nothing wrong with this, personally, if the authors themselves don't.

None: Re: Self-plagiarism?
2008-12-20T14:51:12Z
So maybe Aliston at some point contributed to the paper and later on decided to decline authorship, but the "borrowed" unattributed parts were already there. I can easily see this happening. I've withdrawn my name from papers due to too small a contribution and not once did I check if what was left had borrowed paragraphs from other of my papers in the introduction.

11011110: Re: Self-plagiarism?
2008-12-20T22:01:08Z
I had guessed that Rasmussen was one of the other authors' students, but couldn't tell from a quick Google whose — thanks for the clarification. It seems likely from this that both Sinclair and Jerrum approved the copying in the later papers, then, not just Jerrum alone.