deja boo?

June 2, 2009 at 9:53 am | | everyday science, literature, open thread, science community, scientific integrity, wild web

I’d like to know everyone’s opinion about Deja Vu, the database of “duplicate” scientific articles. Most of the articles in the database are “unverified,” meaning that they could be entirely legitimate (e.g. a reprint). Some are instances of self-plagiarism: an author recycling his or her own abstract or intro for a new paper or review. A few instances are true plagiarism: one group of authors stealing the words (entire paragraphs or papers) of other authors. You can read more in Science.

I can imagine several possible responses (see the poll below):

  1. Great! Now there’s a way for authors, journals, and institutions to better root out plagiarism and unauthorized copying.
  2. Granted, this is information in the public domain, so authors should expect their work to be scrutinized. However, it’s worrisome to have a computer algorithm put red flags on articles that may be legitimate. Deja Vu is probably a good idea, but needs to be reworked.
  3. Careers will be unfairly destroyed by this approach. Labeling a paper as a “duplicate” sounds negative, even when listed as “sanctioned” or “unverified.” This database takes a guilty-until-proven-innocent approach that has the potential to sully the reputation of good scientists.
  4. Um, haven’t these people seen Terminator 2? What is Deja Vu becomes self-aware and starts killing plagiarists.

[poll id=”2″]

Fortunately, an author can check his or her work in the eTBLAST database before submission, to see if a coauthor copied a section, or if the text will unfairly put up red flags. But I found that the results were confusing (e.g. I can’t find the meaning of the “score” or the “z-score”) and unhelpful (of course papers in the same field will have the same keywords). And the results page was really buggy (maybe just in Firefox?).

Personally, I vote #2: Deja Vu is a good idea, but needs to be more careful about the papers it lists as “duplicates,” even “unverified” or “sanctioned.” When a junior faculty memberĀ  gets a hit in the database, his or her name will be associated with plagiarism. Some people will not bother to check if it was a legitimate copy, or even who copied whom. I think that the current approach that Deja Vu takes is reckless and unfair. Even lazy.

Moreover, self-plagiarism is not necessarily bad. Copying your own abstract is different than copying your entire paper. Obviously, at some point, self-plagiarism is unacceptable (e.g. submitting the same paper or review to two journals).

I think this topic deserves more nuance than Deja Vu offers.

(Deja Vu has it’s own survey here.)


  1. Thanks for the link to eTBLAST. I was discussing this with someone the other day and I thought something like that must exist.

    It’s probably pretty good for finding papers related to your work, but I don’t think it will be that useful for plagiarism. I just copied the first two paragraphs of a paper into the search and the copied paper was only the number two hit.

    Comment by Andre — June 2, 2009 #

  2. Another side of this you didn’t mention is the responsibility of the people who use deja vu. I agree it should be clearer on their site how much is duplicated, whether it’s “sanctioned” or not, etc. However, it’s also the responsibility of the reader to consider that this was flagged by an algorithm as similar, but that it may or may not be acceptable, and that the reader should check that for him/herself.

    Comment by mary — June 2, 2009 #

  3. I totally agree. But I worry that not everyone is always careful about reading all the fine print on a website, especially when that website is confusing!

    Comment by sam — June 2, 2009 #

  4. I wonder whether any of these services is capable of spotting the grey areas of self-plagiarism. Two examples that I came upon while writing a review:
    – a well-known chemist publishing 8 papers on one research topic over the course of 6 years; of the eight, 4 are original research papers (excellent research, too), and 4 are reviews. FOUR reviews, completely redundant (in my opinion), serving the only purpose of inflating the publications list.
    – another good chemist publishing 4 papers in which he actually presents research worthy of, say, 2. How does he do it? By reporting the same exact results again (new graphics though) and adding some more meat (40%) on a weakly related topic. For example: “We synthesized A [citing paper 1 in which the synthesis of A was already reported, and showing a new graphic for its synthesis] and also studied the stereochemistry of its azo-derivative B.”

    I’m not even sure it’s absolutely wrong… But the smell isn’t good. Can Deja Vu notice such prolific publishers and warn them/us?

    Comment by David E. — June 5, 2009 #

  5. it depends if they use the same words and paragraphs over again. if the authors describe the same results with entirely different words, only a human can find that (so far)!

    Comment by sam — June 5, 2009 #

  6. What about the scientist who is asked to write something for journal A, and they agree. Then, higher-impact journal B asks for an invited review as well. Finally, editor of journal C wants to cash in a favor by asking the scientist (who is already writing two reviews) to write one more for his less-well-read journal.

    Did I mention the scientist travels incessantly, edits his/her OWN journals, reviews manuscripts constantly, manages a large research lab, and teaches a course or two?

    Sure, there’s gotta be some quality (and integrity) control, but should we expect this scientist to re-invent the wheel every time someone comes asking for a review? Or deny students the chance to write one? Granted, students should NOT EVER recycle entire paragraphs written by another person.

    Comment by PI — June 8, 2009 #

  7. I think that some self-plagiarism is acceptable, and expected. Copying an entire review is not OK. And no one should copy the words of someone else.

    Comment by sam — June 8, 2009 #

  […] search titles and abstracts), sanctioned duplications, etc, etc. as the author of the post "Deja Boo" points out. There are some real instances of duplications (authors attempting to pad their […]

