eLife’s new publishing policy

October 21, 2022 at 10:25 am | | literature, science community, scientific integrity

Let me preface this post with the admission that I’m likely wrong about my concerns about eLife’s new model. I was actually opposed to preprints when I first heard about them in 2006!

The Journal eLife and Mike Eisen announced it’s new model for publishing papers:

  • Authors post preprint.
  • Authors submit preprint to eLife.
  • eLife editorial board decides whether to reject the manuscript or send out for review.
  • After reviews, the paper will be published no matter what the reviews say. The reviews and an eLife Assessment will be published alongside the paper. At this point, the paper has a DOI and is citable.
  • Authors then have a choice of whether to revise their manuscript or just publish as-is.
  • When the authors decide the paper is finalized, that will become the “Version of Record” and the paper will be indexed on Pubmed.

Very interesting and bold move. The goal is to make eLife and its peer review not a gatekeeper of truth, but instead a system of evaluating and summarizing papers. Mike Eisen hopes that readers will see “eLife” and no longer think “that’s probably good science” and instead think “oh, I should read the reviews to see if that’s a good paper.”

Potential problems

But here are my primary concerns

  1. This puts even more power in the hands of the editors to effectively accept/reject papers. And this process is currently opaque, bias-laden, and authors have no recourse when editors make bad decisions.
  2. The idea that the eLife label will no longer have prestige is naive. The journal has built a strong reputation as a great alternative to the glam journal (Science, Nature, Cell) and that’s not going away. For example, people brag when their papers are reviewed in F1000, and I think the same will apply to eLife Assessments: readers will automatically assume that a paper published in eLife is high-impact, regardless of what the Assessment says.
  3. The value that eLife is adding to the process is diminishing, and the price tag is steep ($2000).
  4. The primary problem I have with peer review is that it is simultaneously overly burdensome and not sufficiently rigorous. This model doesn’t substantially reduce the burden on authors to jump through hoops held by the reviewers (or risk a bad eLife Assessment). It also is less rigorous by lowering the bar to “publication.”

Solutions

Concern #1: I think it’s a step in the wrong direction to grant editors even more power. Over the years, editors haven’t exactly proven themselves to be ideal gatekeepers. How can we ensure that the editors will act fairly and don’t get attracted by shiny objects? That said, this policy might actually put more of a spotlight on the desk-rejection step and yield change. eLife could address this concern in various ways:

  • The selection process could be a lottery (granted, this isn’t ideal because finding editors and reviewers for a crappy preprint will be hard).
  • Editors could be required to apply a checklist or algorithmic selection process.
  • The editorial process could be made transparent by publishing the desk rejection/acceptace along with the reasons.

Concern #2 might resolve itself with time. Dunno. Hard to predict how sentiment will change. But I do worry that eLife is trying to change the entire system, while failing to modify any of the perverse incentives that drive the problems in the first place. But maybe it’s better to try something than to do nothing.

Concern #3 is real, but I’m sure that Mike Eisen would love it if a bunch of other journals adopted this model as well and introduced competition. And honestly, collating and publishing all the reviews and writing a summary assessment of the paper is more than what most journals do now.

Journals should be better gatekeepers

But #4 is pretty serious. The peer review process has always had to balance being sufficiently rigorous to avoid publishing junk science with the need to disseminate new information on a reasonable timescale. Now that preprinting is widely accepted and distributing results immediately is super easy, I am less concerned with latter. I believe that the new role of journals should be as more exacting gatekeepers. But it feels like eLife’s policy was crafted exclusively by editors and authors to give themselves more control, reduce the burden for authors, and shirk the responsibility of producing and vetting good science.

There are simply too many low-quality papers. The general public, naive to the vagaries of scientific publishing, often take “peer-reviewed” papers as being true, which is partially why we have a booming supplement industry. Most published research findings are false. Most papers cannot be replicated. Far too many papers rely on pseudoreplication to get low p-values or fail to show multiple biological replicates. And when was the last time you read a paper where the authors blinded their data acquisition or analysis?

For these reasons, I think that the role of a journal in the age of preprints is to better weed out low-quality science. At minimum, editors and peer reviewers should ensure that authors followed the 3Rs (randomize, reduce bias, repeat) before publishing. And there should be a rigorous checklist to ensure that the basics of the scientific process were followed.

Personally, I think the greatest “value-add” that journals could offer would be to arrange a convincing replication of the findings before publishing (peer replication), then just do away with the annoying peer review dog-and-pony show altogether.

Conclusion

We’ll have to wait and see how this new model plays out, and how eLife corrects stumbling blocks along the way. I have hope that, with good editorial team and good practices/rules around the selection process, eLife might be able to pull this off. Not sure if it’s a model that will scale to other, less trustworthy journals.

But just because this isn’t my personal favorite solution to the problem of scientific publishing, that doesn’t mean that eLife’s efforts won’t help make a better world. I changed my mind about the value of preprints, and I’ll be happy to change my mind about eLife’s new publishing model if it turns out to be a net good!

Checklist for Experimental Design

October 25, 2021 at 9:42 am | | everyday science, scientific integrity

One of the worst feelings as a scientist is to realize that you performed n-1 of the controls you needed to really answer the question you’re trying to solve.

My #1 top recommendation when starting a new set of experiments is to write down a step-by-step plan. Include what controls you plan to do and how many replicates. Exact details like concentrations are less important than the overarching experimental plan. Ask for feedback from your PI, but also from a collaborator or labmate.

Here are some things I consider when starting an experiment. Feel free to leave comments about things I’ve missed.

Randomization and Independence

  • Consider what “independent” means in the context of your experiment.
  • Calculate p-value using independent samples (biological replicates), not the total number of cells, unless you use advanced hierarchical analyses (see: previous blog post and Lord et al.: https://doi.org/10.1083/jcb.202001064).
  • Pairing or “blocking” samples (e.g. splitting one aliquot into treatment and control) helps reduce confounding parameters, such as passage number, location in incubator, cell confluence, etc.

Power and Statistics

  • Statistical power is to the ability to distinguish small but real differences between conditions/populations.
  • Increasing the number of independent experimental rounds (AKA “biological replicates”) typically has a much larger influence on power than the cells or number of measurements per sample (see: Blainey et al.: https://doi.org/10.1038/nmeth.3091).
  • If the power of an assay is known, you can calculate the number of samples required to be confident you will be able to observe an effect.
  • Consider preregistering
    • Planning the experimental and analysis design before starting the experiment substantially reduces the chances false positives.
    • Although formal preregistration is not typically required for cell biology studies, simply writing the plan down for yourself in your notebook is far better than winging it as you go.
    • Plan the number of biological replicates before running statistical analysis. If you instead check if a signal is “significant” between rounds of experiment and stop when p < 0.05, you’re all but guaranteed to find a false result.
    • Similarly, don’t try several tests until you stumble upon one that gives a “significant” p-value.

Blocking

  • “Blocking” means to subdivide samples into similar units and running those sets together. For example, splitting a single flask of cells into two, one treatment and one control.
  • Blocking can help reveal effects even when experimental error or sample-to-sample variability is large.
  • Blocked analyses include paired t-test or normalizing the treatment signal to the control within each block.

Controls

  • Sledgehammer controls
    • These controls are virtually guaranteed to give you zero or full signal, and are a nice simple test of the system.
    • Examples include wild type cells, treating with DMSO, imaging autofluorescence of cells not expressing GFP, etc.
  • Subtle controls
    • These controls are more subtle than the strong controls, and might reveal some unexpected failures.
    • Examples include: using secondary antibody only, checking for bleed-through and crosstalk between fluorescence channels, and using scrambled siRNA.
  • Positive & negative controls
    • The “assay window” is a measure of the range between the maximum expected signal (positive control) and the baseline (negative control).
    • A quantitative measure of the assay window could be a standardized effect size, like Cohen’s d, calculated with multiple positive and negative controls.
    • In practice, few cell biologist perform multiple control runs before an experiment. So a qualitative estimate of the assay window should be considered using the expected signal and expected variability sample to sample. In other words, consider carefully if an experiment can possibly work
  • Concurrent vs historical controls
    • Running positive and/or negative control in the same day’s experimental run as the samples that receive real treatment helps eliminate additional variability.
  • Internal controls
    • “Internal” controls are cells within the same sample that randomly receive treatment or control. For example, during a transient transfection, only a portion of the cells may actually end up expressing, while those that aren’t can act as a negative control.
    • Because cells with the same sample experience the same perturbations (such as position in incubator, passage number, media age) except for the treatment of interest, internal controls can remove many spurious variables and make analysis more straightforward.

Bias

  • Blinding acquisition
    • Often as simple as having a labmate put tape over labels on your samples and label them with a dummy index. Confirm that your coworker actually writes down the key, so later you can decode the dummy index back to the true sample information.
    • In cases where true blinding is impractical, the selection of cells to image/collect should be randomized (e.g. set random coordinates for the microscope stage) or otherwise designed to avoid bias (e.g. selecting cells using transmitted light or DAPI).
  • Blinding analysis
    • Ideally, image analysis would be done entirely by algorithms and computers, but often the most practical and effective approach is old-fashioned human eye.
    • Ensuring your manual analysis isn’t biased is usually as simple as scrambling filenames. For microscopy data, Steve Royle‘s macro, which works well: https://github.com/quantixed/imagej-macros#blind-analysis
    • I would highly recommend copying all the data to a new folder before you perform any filename changes. Then test the program forward and backwards to confirm everything works as expected. Maybe perform analysis in batches, so in case something goes awry, you don’t lose all that work.

Resources

Great primer on experimental design and analysis, especially for the cell biologist or microscopist: Stephen Royle, “The Digital Cell: Cell Biology as a Data Science” https://cshlpress.com/default.tpl?action=full&–eqskudatarq=1282

Advanced, detailed (but easily digestible) book on experimental design and statistics: Stanley Lazic, “Experimental Design for Laboratory Biologists” https://stanlazic.github.io/EDLB.html

I like this very useful and easy-to-follow stats book: Whitlock & Schluter, “The Analysis of Biological Data” https://whitlockschluter.zoology.ubc.ca

Alex Reinhart, “Statistics Done Wrong”

SuperPlots: Communicating reproducibility and variability in cell biology. (HTMLPDF)
Lord, S. J.; Velle, K. B.; Mullins, R. D.; Fritz-Laylin, L. K. J. Cell Biol.2020219(6), e202001064.

Replace Peer Review with “Peer Replication”

October 13, 2021 at 1:35 pm | | literature, science and the public, science community, scientific integrity

As I’ve posted before and many others have noted, there is a serious problem with lack of adequate replication in many fields of science. The current peer review process is a dreadful combination of being both very fallible and also a huge hurdle to communicating important science.

Instead of waiting for a few experts in the field to read and apply their stamp of approval to a manuscript, the real test of a paper should be the ability to reproduce its findings in the real world. (As Andy York has pointed out, the best test of a new method is not a peer reviewing your paper, but a peer actually using your technique.) But almost no published papers are subsequently replicated by independent labs, because there is very little incentive for anyone to spend time and resources testing an already published finding. That is precisely the opposite of how science should ideally operate.

Let’s Replace Traditional Peer Review with “Peer Replication”

Instead of sending out a manuscript to anonymous referees to read and review, preprints should be sent to other labs to actually replicate the findings. Once the key findings are replicated, the manuscript would be accepted and published.

(Of course, as many of us do with preprints, authors can solicit comments from colleagues and revise a manuscript based on that feedback. The difference is that editors would neither seek such feedback nor require revisions.)

Along with the original data, the results of the attempted replication would be presented, for example as a table that includes which reagents/techniques were identical. The more parameters that are different between the original experiment and the replication, the more robust the ultimate finding if the referees get similar results.

A purely hypothetical example of the findings after referees attempt to replicate. Of course in reality, my results would always have green checkmarks.

Incentives

What incentive would any professor have to volunteer their time (or their trainees’ time) to try to reproduce someone else’s experiment? Simple: credit. Traditional peer review requires a lot of time and effort to do well, but with zero reward except a warm fuzzy feeling (if that). For papers published after peer replication, the names of researchers who undertook the replication work will be included in the published paper (on a separate line). Unlike peer review, the referees will actually receive compensation for their work in the form of citations and another paper to include on their CV.

Why would authors be willing to have their precious findings put through the wringer of real-world replication? First and foremost, because most scientists value finding truth, and would love to show that their findings hold up even after rigorous testing. Secondly, the process should actually be more rewarding than traditional peer review, which puts a huge burden on the authors to perform additional experiments and defend their work against armchair reviewers. Peer replication turns the process on its head: the referees would do the work of defending the manuscript’s findings.

Feasible Experiments

There are serious impediments to actually reproducing a lot of findings that use seriously advanced scientific techniques or require long times or a lot of resources (e.g. mouse work). It will be the job of editors—in collaboration with the authors and referees—to determine the set of experiments that will be undertaken, balancing rigor and feasibility. Of course, this might leave some of the most complex experiments unreplicated, but then it would be up to the readers to decide for themselves how to judge the paper as a whole.

What if all the experiments in the paper are too complicated to replicate? Then you can submit to JOOT.

Ancillary Benefits

Peer replication transforms the adversarial process of peer review into a cooperation among colleagues to seek the truth. Another set of eyes and brains on an experiment could introduce additional controls or alternative experimental approaches that would bolster the original finding.

This approach also encourages sharing experimental procedures among labs in a manner that can foster future collaborations, inspire novel approaches, and train students and postdocs in a wider range of techniques. Too often, valuable hands-on knowledge is sequestered in individual labs; peer replication would offer an avenue to disseminate those skills.

Peer replication would reduce fraud. Often, the other authors on an ultimately retracted paper only later discover that their coworker fabricated data. It would be nearly impossible for a researcher to pass off fabricated data or manipulated images as real if other researchers actually attempt to reproduce the experimental results. 

Potential Problems

One serious problem with peer replication is the additional time it may take between submission and ultimate publication. On the other hand, it often takes many months to go through the traditional peer review process, and replicating experiments may not actually add any time in many cases. Still this could be mitigated by authors submitting segments of stories as they go. Instead of waiting until the entire manuscript is polished, authors or editors could start arranging replications while the manuscript is still in preparation. Ideally, there would even be a  journal-blind mechanism (like ReviewCommons) to arrange reproducing these piecewise findings.

Another problem is what to do when the replications fail. There would still need to be a judgement call as to whether the failed replication is essential to the manuscript and/or if the attempt at replication was adequately undertaken. Going a second round at attempting a replication may be warranted, but editors would have to be wary of just repeating until something works and then stopping. Pre-registering the replication plan could help with that. Also, including details of the failed replications in the published paper would be a must.

Finally, there would still be the problem of authors “shopping” their manuscript. If the replications fail and the manuscript is rejected, the authors could simply submit to another journal. I think the rejected papers would need to be archived in some fashion to maintain transparency and accountability. This would also allow some mechanism for the peer replicators to get credit for their efforts.

Summary of Roles:

  • Editor:
    • Screen submissions and reject manuscripts with obviously flawed science, experiments not worth replicating, essential controls missing, or seriously boring results.
    • Find appropriate referees.
    • With authors and referees, collaboratively decide which experiments the referees should attempt to replicate and how.
    • Ultimately conclude, in consultation with referees, whether the findings in the papers are sufficiently reproducible to warrant full publication.
  • Authors:
    • Write the manuscript, seek feedback (e.g. via bioRxiv), and make revisions before submitting to the journal.
    • Assist referees with experimental design, reagents, and even access to personnel or specialized equipment if necessary.
  • Referees:
    • Faithfully attempt to reproduce the experimental results core to the manuscript.
    • Optional: Perform any necessary additional experiments or controls to close any substantial flaws in the work.
    • Collate results.
  • Readers:
    • Read the published paper and decide for themselves if the evidence supports the claims, with the confidence that the key experiments have been independently replicated by another lab.
    • Cite reproducible science.

How to Get Started

While it would be great if a journal like eLife simply piloted a peer replication pathway, I don’t think we can wait for Big Publication to initiate the shift away from traditional peer review. Maybe the quickest route would be for an organization like Review Commons to organize a trial of this new approach. They could identify some good candidates from bioRxiv and, with the authors, recruit referees to undertake the replications. Then the entire package could be shopped to journals.

I suspect that once scientists see peer replication in print, it will be hard to take seriously papers vetted only by peer review. Better science will outcompete unreproduced findings.

(Thanks Arthur Charles-Orszag for the fruitful discussions!)

avoiding bias by blinding

July 3, 2020 at 11:15 am | | literature, scientific integrity

Key components of the scientific process are: controls, avoiding bias, and replication. Most scientists are great at controls, but without the other two, we’re simply not doing science.

The lack independent samples (and thus improper inflation of n) and the failure to blind experiments are too common. The implications of these mistakes, especially when combined in one study, mean that many published cell biology results are likely artifact. Generating large datasets, even with a slight bias, can quickly yield “significant” results out of noise. For example, see the “NHST is unsuitable for large datasets” section of:

Szucs D, Ioannidis JPA. When Null Hypothesis Significance Testing Is Unsuitable for Research: A ReassessmentFront Hum Neurosci. 2017;11:390. https://pubmed.ncbi.nlm.nih.gov/28824397/

Now combine these common false positives with the inclination to publish flashy results, and we’ve made a recipe for unreliable scientific literature.

I do not condemn authors for these problems. Most of us have made one or all of these mistakes. I have. And I probably will again in the future. Science is hard. There is no shame in making honest mistakes. But we can all strive to be better (see my last section).

Failing to perform the data collection blinded

Blinding is just basic scientific rigor, and skipping this should be considered almost as bad as skipping controls.

Blinding samples during data collection and analysis is ideal. For data collection, it is usually as simple as having a labmate put tape over labels on your samples and label them with a dummy index. Insist that your coworker writes down the key, so later you can decode the dummy index back to the true sample information.

In cases where true blinding is impractical, the selection of cells to image/collect should be randomized (e.g. set random coordinates for the microscope stage) or otherwise designed to avoid bias (e.g. selecting cells using transmitted light if fluorescence the readout).

Failing to perform the data analysis blinded

Blinding during data analysis is generally very practical, even when the original data was not collected in a bias-free fashion. Ideally, image analysis would be done entirely by algorithms and computers, but often the most practical and effective approach is old-fashioned human eye. Ensuring your manual analysis isn’t biased is usually as simple as scrambling the image filenames.

I stumbled upon these ImageJ macros for randomizing and derandomizing image filenames, written by Martin Höhne: http://imagej.1557.x6.nabble.com/Macro-for-Blind-Analyses-td3687632.html

More recently, Christophe Leterrier directed me to Steve Royle‘s macro, which works very well: https://github.com/quantixed/imagej-macros#blind-analysis

There are probably some excellent solution using Python. Regardless of the approach you take, I would highly recommend copying all the data to a new folder before you perform any filename changes. Then test the program forward and backwards to confirm everything works as expected. Maybe perform analysis in batches, so in case something goes awry, you don’t lose all that work.

My “blind and dumb” experiment

There are many stories about unintended bias leading to false conclusions. Here’s mine: I was testing to see whether a drug treatment inhibited cells from crawling through a porous barrier by counting the number of cells that made it through the barrier to an adjacent well.

My partner in crime had labeled the samples with dummy indices, so I didn’t know which wells were treated and which were control. But I immediately could tell that there were more cells in one set of wells, so I presumed those were the control set. Fortunately, I had taken the extra precaution of randomizing the stage positions, so I didn’t let my bias alter the data collection. We then blinded the analysis by relabeling the microscopy images. I manually counted all the cells in each image.

We then unblinded the samples. At first, we were disappointed that the wells I had assumed were control turned out to be treated. Then we looked at the results. SURPRISE! My snap judgement at the beginning of the experiment had been precisely backwards: the wells I thought looked like they had sparser cells actually had significantly more on average. So it turned out that the drug treatment had indeed worked. Thankfully, I didn’t rely on my snap judgement nor allow that bias to influence the results.

Treating each cell as an n in the statistical analysis

This error plagues the majority of cell biology papers. Go scan a recent issue of your favorite journal and count the number of papers that have minuscule P values; invariably, the authors aggregated all the cell measurements from multiple experiments and calculated the t-test or ANOVA based on those dozens or hundreds of measurements. This is a fatal error.

It is patently absurd to consider neighboring cells in the same dish all treated simultaneously with the same drug as independent tests of a hypothesis.

If your neighbor told you that he ate a banana peal and it reversed his balding, you might be a little skeptical. If he further explained that he measured 1000 of his hair follicles before and after eating a banana peel and measured a P < 0.05 difference in growth rate, would you be convinced? Maybe it was just a fluke or noise that his hairs started growing faster. You would want him to repeat the experiment a few times (maybe even with different people) before you started believing.

Similarly, there are many reasons two dishes of cells might be different. To start believing that a treatment is truly effective, we all understand that we should repeat the experiment a few times and get similar results. Counting each cell measurement as the sample size n all but guarantees a small—but meaningless—P value.

Observe how dramatically different scenarios (on the right) yield the same plot and P value when you assume each cell is a separate n (on the left):

Elegant solutions include “hierarchical” or “nested” or “mixed effect” statistics. A simple approach is to separately pool the cell-level data from each experiment, then compare experiment-level means (n the case above, the n for each condition would be 3, not 300). For more details, please read my previous blog post or our paper:

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK. SuperPlots: Communicating reproducibility and variability in cell biology. J Cell Biol. 2020;219(6):e202001064.
https://pubmed.ncbi.nlm.nih.gov/32346721

How do we fix this?

Professors need to teach their trainees the very basics about how to design experiments (see Stan Lazic’s book: Experimental Design for Laboratory Biologists) and perform analysis (see Mike Whitlock and Dolph Schluter’s book: The Analysis of Biological Data). PIs need to provide researchers with the tools to blind their experiments or otherwise remove bias. They need to ask for multiple biological replicates and correctly calculated P values. This does not require advanced understanding of statistics, just the basic understanding of the importance of repeating an experiment multiple times to ensure an observation is real.

Editors and referees need to demand correct data analysis. While asking researchers to redo an experiment isn’t really acceptable, requiring a reanalysis of the data after blinding or recalculating P values based on biological replicates seems fair. Editors should not even send manuscripts to referees if the above errors are not corrected or at least addressed in some fashion. Editors can offer the simple solutions listed above.

UPDATE: Also read my proposal to replace peer review with peer “replication.”

unbelievably small P values?

November 18, 2019 at 9:56 am | | literature, scientific integrity

Check out our newest preprint at arXiv:

If your P value looks too good to be true, it probably is: Communicating reproducibility and variability in cell biology

Lord, S. J.; Velle, K. B.; Mullins, R. D.; Fritz-Laylin, L. K. arXiv 2019, 1911.03509. https://arxiv.org/abs/1911.03509

UPDATE: Now published in JCB: https://doi.org/10.1083/jcb.202001064

I’ve noticed a promising trend away from bar graphs in the cell biology literature. That’s great, because reporting simply the average and SD or SEM or an entire dataset conceals a lot of information. So it’s nice to see column scatter, beeswarm, violin, and other plots that show the distribution of the data.

But a concerning outcome of this trend is that, when authors decide to plot every measurement or every cell as a separate datapoint, it seems to trick people into thinking that each cell is an independent sample. Clearly, two cells in the same flask treated with a drug are not independent tests of whether the drug works: there are many reasons the cells in that particular flask might be different from those in other flasks. To really test a hypothesis that the drug influences the cells, one must repeat the drug treatment multiple times and check if the observed effect happens repeatably.

I scanned the latest issues of popular cell biology journals and found that over half the papers counted each cell as a separate N and calculated P values and SEM using that inflated count.

Notice that bar graphs—and even beeswarm plots—fail to capture the sample-to-sample variability in the data. This can have huge consequences: in C, the data is really random, but counting each cell as its own independent sample results in minuscule error bars and a laughably small P value.

But that’s not to say the the variability cell-to-cell is unimportant! The fact that some cells in a flask react dramatically to a treatment and others carry on just fine might have very important implications in an actual body.

So we proposed “SuperPlots,” which superimpose sample-to-sample summary data on top of the cell-level distribution. This is a simple way to convey both variability of the underlying data and the repeatability of the experiment. It doesn’t really require any complicated plotting or programming skills. On the simplest level, you can simply paste two (or more!) plots in Illustrator and overlay them. Play around with colors and transparency to make it visually appealing, and you’re done! (We also give a tutorial on how we made the plots above in Graphpad Prism.)

Let me know what you think!

UPDATE: We simplified the figure:

Figure 1. Drastically different experimental outcomes can result in the same plots and statistics unless experiment-to-experiment variability is considered. (A) Problematic plots treat N as the number of cells, resulting in tiny error bars and P values. These plots also conceal any systematic run- to-run error, mixing it with cell-to-cell variability. To illustrate this, we simulated three different scenarios that all have identical underlying cell-level values but are clustered differently by experiment: (B) shows highly repeatable, unclustered data, (C) shows day-to-day variability, but a consistent trend in each experiment, and (D) is dominated by one random run. Note that the plots in (A) that treat each cell as its own N fail to distinguish the three scenarios, claiming a significant difference after drug treatment, even when the experiments are not actually repeatable. To correct that, “SuperPlots” superimpose summary statistics from biological replicates consisting of independent experiments on top of data from all cells, and P values were calculated using an N of three, not 300. In this case, the cell-level values were separately pooled for each biological replicate and the mean calculated for each pool; those three means were then used to calculate the average (horizontal bar), standard error of the mean (error bars), and P value. While the dot plots in the “OK” column ensure that the P values are calculated correctly, they still fail to convey the experiment-to-experiment differences. In the SuperPlots, each biological replicate is color-coded: the averages from one experimental run are yellow dots, another independent experiment is represented by gray triangles, and a third experiment is shown as blue squares. This helps convey whether the trend is observed within each experimental run, as well as for the dataset as a whole. The beeswarm SuperPlots in the rightmost column represent each cell with a dot that is color coded according to the biological replicate it came from. The P values represent an unpaired two-tailed t-test (A) and a paired two-tailed t-test for (B-D). For tutorials on making SuperPlots in Prism, R, Python, and Excel, see the supporting information.

self-plagiarism and JACS

April 25, 2012 at 7:52 am | | literature, science community, scientific integrity

Hi all! I’m back! Well, not exactly: I won’t be posting nearly as much as I did a few years ago, but I do hope to start posting more than once a year. Sorry for my absence. There’s no real excuse except my laziness, a new postdoc position, commuting, and a new baby. I suppose those are good excuses, really. Also, I’m sorry to say, that I’ve been cheating on you, posting on another blog. We love each other, and I won’t stop, but I want to keep you Everyday Scientist readers in my live, too. I’m just not going to pay as much attention to you as I used to. You’re cool with that, right?


Anyway, I thought I’d comment on the recent blogstorm regarding Ronald Breslow’s apparently self-plagiarized JACS paper. Read the full stories here (1, 2, 3, etc.).

I feel bad for Breslow, because I like him and I respect his work and I think his paper in JACS is valuable. However, I think he should retract his paper. Sorry, but if some no-name had been caught completely copying and pasting his or her previously published paper(s) and submitting that to JACS as an ostensibly novel manuscript, that paper would be retracted when found out. If he had just copied the intro paragraph, I’d be more forgiving, but the entire document is copied (except, that is, the name of the journal)!

That said, it might be possible to save the JACS paper, but the editors would have to label the article as an Editorial or Perspective or something, and explicitly state that the article is reprinted from previous sources. I know that might not be fair, to give Breslow special treatment, but life isn’t fair. Famous scientists might get away with more than peons. And, honestly, Breslow’s paper remaining in JACS might be good for future humanity, because JACS archive will probably be more accessible than other sources. That way, we’ll be able to look up what to do when space dinosaurs visit us!

extraordinarily repeatable data

May 3, 2011 at 5:31 pm | | crazy figure contest, literature, scientific integrity

UPDATE: My friend on Facebook pointed out that Figure S5c in the supporting info is even more fishy (click on the image below to see a zoomed-in version). Clearly, some portions of the image were pasted on top of other parts. On the right, it is obvious that the top part of the image is from a different frame as the bottom part. On the left, it looks like there’s another image hidden behind (see the strip showing through on the left top part of the image). I’ve added red arrows to aid the eye.

This could possibly be mistakes by someone who doesn’t know how to use Photoshop layers, but I’m thinking there might have been some intentional manipulation of the data. Either way, this type of slicing and stitching and Photoshopping of scientific data is totally unacceptable. I think Nature editors and referees should be more than ashamed to have let this slide.

Nature editors announced that they are investigating.

(Original post below.)

This paper in Nature contains some serious errors: some of the images that are purportedly from different samples (different mice, even) appear to be identical! Note the triangle of spots in the two images below:

Many commenters have noticed the weirdnesses in the figures. This is my favorite comment so far:

2011-04-22 09:31 AM aston panda said:
This is an excellent article shows extraordinary .. skills and amazingly repeatable data. for example
Fig.1a, 2 middle vs 3 bottom left
Fig.1c, 2 right side vs 3 left side
Fig.S4, 1 left side vs 2 right side
Fig.S5, c4 middle right vs e4 middle left
GOOD JOB

I suspect that some sloppy organizing by the authors led to them mixing up some files on their computer. That’s my optimistic view. If they were trying to fabricate data, they wouldn’t use the same region of the same image of the same sample! It must have been sloppy bookkeeping. I hope their results stand up after they correct these errors.

It just goes to show that real science can’t get accepted into Nature and Science. ;)

UPDATE 2: RetractionWatch is surprised that this paper eventually was published with only a correction!

CV of failures

December 10, 2010 at 12:34 pm | | postdoc life, science community, scientific integrity

Nature Jobs has encouraged folks to spread the word of their failures, as well as their successes. So I made a CV of my failures. I’m probably forgetting a lot of failures, but I’m not really motivated to spend a lot of time trying to remember. Anyone else want to air their dirty laundry?

man bites dog

August 14, 2009 at 9:47 am | | grad life, news, science community, scientific integrity

Former Stanford graduate student Christopher Sclimenti is suing his former PI, Prof Michele Calos, for patent infringement and plagiarism. See stories here, here, and here. The complaint can be read here.

The summary is as follows:

  • Student was originally on a patent application.
  • At some point, Stanford and/or prof removed student’s name from application, which becomes this patent. Prof Calos is the only inventor listed on the patent.
  • Prof filed a second patent, which is a continuation of the first. Prof Calos is still the only inventor listed.
  • Prof filed two other applications (here and here), still the sole inventor listed, with significant portions copied from the student’s dissertation. (Stanford Daily found about 20 paragraphs in one application that were essentially identical to paragraphs in his dissertation!)

All parties agreed to an Alternative Dispute Resolution (ADR) with a neutral party. The ADR panel concluded that the student was a co-inventor and should have been included in the patent. As a result, Stanford agreed to add him to the issued patent (but I see no evidence that that has occurred yet).

CalosAccording to Stanford’s OTL page, inventorship is different than what most scientists would consider authorship. For instance, “A person who contributed only labor and/or the supervision of routine techniques, but who did not contribute to the idea—the concept of one of the embodiments of the claimed invention—is not considered an inventor.” For the prof and the University to claim that the student was not an inventor, they implied that he was only a technician and did not contribute to conceiving any of the claims in the patent. That’s possible, but the ADR panel disagreed. It seems pretty straightforward that the student should have been on at least the first patent!

Why would Prof Calos and Stanford University fight so hard against their former student, who clearly contributed enough to the invention to appear on the original patent application? Is splitting the royalties from one patent with one extra person, a student who contributed to the work, so terribly painful? Stanford’s policy is to divide the royalties (after paying for the patenting and lawyer fees) 1/3 to the inventors, 1/3 to the department, and 1/3 to the school. So the prof loses half of her royalties by re-including her former student as an inventor, but the University loses nothing.

Is the recent patent application plagiarized from the student’s dissertation? Only if the dissertation was not first self-plagiarized from an earlier paper. Who knows. Regarding the plagiarism complaint, Stanford had this to say:

“I think we’ve really done our part at this point,” Dunkley said. “The inventorship has been corrected. He has been made whole for any amount that he would have received if he had been an inventor from the beginning. So from the University’s perspective, all necessary action has been taken to rectify any differences on the inventorship issue.” (source: Stanford Daily)

Sclimenti

That’s not really very satisfying. What if the roles were reversed and a student copied significant portions of his PI’s earlier grant proposal into his dissertation without the PI’s permission? Or submitted a paper without the PI’s knowledge? That student would probably be kicked out of Stanford at a minimum. The least the University could do is investigate this case and if Prof Calos has a history of taking credit for other peoples’ work. Maybe Prof Calos is innocent and the student is trying to steal credit, but it would be nice if the University would check into it.

All in all, the entire situation is not clear-cut. I suspect that the whole incident is the result of large egos, hurt feelings, and greed—from all parties! This is why it is very important to not burn bridges and to try to empathize with your PI or your student. I suspect this conflict could have been resolved early on if all parties had been more understanding and willing to listen and compromise.

Bottom line though, I find it unfortunate that the University would fight one of its own students.

P.S. Guess what Sclimenti is doing now?

you don’t understand how karma works

July 30, 2009 at 8:34 am | | news, scientific integrity

A former employee at SLAC intentionally destroyed thousands of protein crystals. Fortunately, only a couple hundred had not already been measured. You can read about it at C&E News or SF Chronicle.

Apparently, she claims that she wanted to reset her bad karma by harming her boss who had fired her (for not showing up to work for weeks). That’s not karma, that’s vengence.

We don’t know the entire story; maybe her boss was a real jackass. Regardless, she acheived vengence upon her boss by harming many other people—who had spent their research time crystallizing those proteins. That’s not very considerate.

They let this crazy woman in and Charles can’t even get past the gate at SLAC when he’s invited there!

UPDATE: She’s missing!

missing link?

June 16, 2009 at 7:20 am | | science and the public, science community, scientific integrity

I read this interesting editorial in Science about the media hyping of a recent archeological find. Just look at Jørn Hurum, the team leader:

hurum

The report of finding the intact skeleton of a monkey-thing was reported in PLOS One, but not before the media hype started. There is a TV documentary of the find and the research on the fossil, and the group is touting the find as the next Rosetta Stone and the “missing link” between apes and humans. For instance, here are a few quotes from the team that reported the find:

“This specimen is like finding the lost ark for archaeologists. It is the scientific equivalent of the Holy Grail.” –Jørn Hurum

“[It is] like the eighth wonder of the world.” –Jens Franzen

What douchebags. This kind of bullshit is seriously likely to undermine the credibility of science in the public eye. Going around claiming that you’ve found the missing link—not to fellow scientist but to the public at large—is very dangerous: when it turns out that your monkey-thing is not a little human, the incident will only add gasoline to the anti-evolution fire. If it really is the missing link, let your fellow paleontologists make those statements.

I find this type of grandstanding by the authors scary, and very reminiscent of the Fleischmann and Pons cold fusion debacle. In fact, I recently watched a 60 Minutes episode about cold fusion, in which Fleischmann stated that his only regrets were naming the effect he saw “fusion” and holding a press conference. In other words, if he and Pons had not overhyped their results directly to the media, then maybe they wouldn’t have been run out of science when their science turned out to have fatal flaws.

Hurum claims that he’s only hyping in this fashion in order to help get children intersted in science. But clearly, his base motivation is to make himself rich and famous. Yes, we should get children excited about real science, but not at the expense of scientific integrity.

Or maybe this little monkey-thing will end up being seen as a great scientific discovery for generations. But I doubt it.

deja boo?

June 2, 2009 at 9:53 am | | everyday science, literature, open thread, science community, scientific integrity, wild web

I’d like to know everyone’s opinion about Deja Vu, the database of “duplicate” scientific articles. Most of the articles in the database are “unverified,” meaning that they could be entirely legitimate (e.g. a reprint). Some are instances of self-plagiarism: an author recycling his or her own abstract or intro for a new paper or review. A few instances are true plagiarism: one group of authors stealing the words (entire paragraphs or papers) of other authors. You can read more in Science.

I can imagine several possible responses (see the poll below):

  1. Great! Now there’s a way for authors, journals, and institutions to better root out plagiarism and unauthorized copying.
  2. Granted, this is information in the public domain, so authors should expect their work to be scrutinized. However, it’s worrisome to have a computer algorithm put red flags on articles that may be legitimate. Deja Vu is probably a good idea, but needs to be reworked.
  3. Careers will be unfairly destroyed by this approach. Labeling a paper as a “duplicate” sounds negative, even when listed as “sanctioned” or “unverified.” This database takes a guilty-until-proven-innocent approach that has the potential to sully the reputation of good scientists.
  4. Um, haven’t these people seen Terminator 2? What is Deja Vu becomes self-aware and starts killing plagiarists.

[poll id=”2″]

Fortunately, an author can check his or her work in the eTBLAST database before submission, to see if a coauthor copied a section, or if the text will unfairly put up red flags. But I found that the results were confusing (e.g. I can’t find the meaning of the “score” or the “z-score”) and unhelpful (of course papers in the same field will have the same keywords). And the results page was really buggy (maybe just in Firefox?).

Personally, I vote #2: Deja Vu is a good idea, but needs to be more careful about the papers it lists as “duplicates,” even “unverified” or “sanctioned.” When a junior faculty member  gets a hit in the database, his or her name will be associated with plagiarism. Some people will not bother to check if it was a legitimate copy, or even who copied whom. I think that the current approach that Deja Vu takes is reckless and unfair. Even lazy.

Moreover, self-plagiarism is not necessarily bad. Copying your own abstract is different than copying your entire paper. Obviously, at some point, self-plagiarism is unacceptable (e.g. submitting the same paper or review to two journals).

I think this topic deserves more nuance than Deja Vu offers.

(Deja Vu has it’s own survey here.)

So that’s how you make a laser work

March 19, 2009 at 5:25 pm | | scientific integrity, wild web

Photoshop!

shopd

This image is from a Wired! article on a 100kW mobile laser defense system that was recently tested.  I guess they’re planning on just airbrushing out the warheads faster than the Iranians can photoshop them in.

someone told the editors

November 11, 2008 at 9:57 am | | literature, news, scientific integrity

A while back, David posted two paragraphs from two different papers with different authors; they were nearly identical. Since then, someone has alerted the editors, because they retracted the paper:

Whoa. (Thanks for the heads-up, Roberto.)

i guess this is how you comment on a JACS comm

September 29, 2008 at 11:15 am | | crazy figure contest, EDSELs, literature, science community, scientific integrity

So I guess there’s not really an official way to get a comment published in JACS. So I’ll be a jerk and complain to everyone on the AEthernet. I saw a paper in JACS which really caught my eye, an interesting title for me, who designs fluorophores:

Yamaguchi, Y.; Matsubara, Y.; Ochi, T.; Wakamiya, T.;  Yoshida, Z.-I. How the pi Conjugation Length Affects the Fluorescence Emission Efficiency. J. Am. Chem. Soc. 2008 (ASAP).

And, of course this amazing fit to data in the TOC image (scroll down to see what this plot should look like):

My first thought was, Whoa! Then I immediately thought, Wait, why do all the points fall exactly on the theory line? That’s unusual. Still, I read the paper with much interest. By the time I got to the end, I earnestly thought it might be an April Fools edition JACS.

I followed the basic theory (Marcus-Hush theory) and the mathematical manipulations. Their result was fascinating: the length of the pi conjugation should directly influence the deexcitation rates: , where Aπ is the length of the conjugation, Β is a constant (approximately 1 Å-1), c is the speed of light, ν is the emission frequency, h is Planck’s constant, and kr and knr are the radiative and nonradiative deexcitation rates, respectively. This is interesting, because fluorescence quantum yield (Φf) is defined by those same rates: . So inserting an equation that depends on conjugation length should be a simple and interesting result.

But, for some reason, the authors normalize out the leading factor. I didn’t really understand why. Anyway, the final result is a little different than I would have figured: . Now, somehow Aπ can be negative, and the authors justify that with the fact that it had become a logarithm in their mathematical gymnastics. I won’t really argue that that’s wrong, because I don’t understand why they did it in the first place.

And here comes the central problem with the paper. In order to confirm this theoretical relationship between quantum yield and pi length, they plot the theoretical equation along with data they have measured (plot above). But they never measure Aπ, they calculate it from the measured rates listed in the table; those same rates were calculated from the measured quantum yield. This is circular logic. So there’s no “correlation between absolute fluorescence quantum yield (Φf) and magnitude (Aπ) of π conjugation length,” as they claim. Instead, they simply plot the ratio of rates versus a different ratio of those same rates. The real axes of the plot are  on the ordinate and  on the abscissa. That’s totally unfair and misleading!

They claim that other independent measures of pi length also work, and that is shown in (of course) the Supporting Information. There, they do give some analysis using Δν1/2a3/2 as a value for Aπ, where Δν is the Stokes shift in a given solvent, and a is the Onsager radius of the molecule in a continuous dielectric medium (taking the relevant factors of the Lippert-Mataga equation). The authors chose not to plot this analysis—they offer only a table—so I’ll plot the real results for you:

That’s sad. Note also that the calculated values cannot be less than 0.5, because size is always positive and even zero for this Lippert-Mataga value of Aπ means that the exponential goes to 1 and the denominator of the new theoretical quantum-yield equation goes to 2.

How does Aπ scale with the Onsager radius or the Lippert-Mataga measure of size?

Well, there is a trend. Not a great trend, but a trend nonetheless. This paper would have been a lot better if they had explored these relationships more, finding a better measure or estimator of size or Aπ. Instead, the authors decided to deceive us with their beautiful plot.

Assumptions in this paper:

  1. That all the nonradiative pathways come from intramolecular charge transfer.
  2. That the emission wavelength does not change with increasing pi conjugation.
  3. For the independent test, that the charge transfer in all cases is unity, so that the change in dipole moment from ground to excited state equals the distance over which the charge transfer occurs.

Assumption 1 is fair, but not entirely applicable in the real world. Assumption 2 is patently false, which they even demonstrate in one of their figures; however, that may not be this paper’s fatal flaw. Assumption 3 is, well, fine … whatever. The real problem is that the authors do not independently test the theoretical prediction, and use circular logic to make a dazzling plot (dazzling to the reviewers, at least).

The biggest disappointment is that the approach and the concept is really interesting, but the authors fail to follow through. I think this could have been an great paper (or at an least acceptable one) if they had been able to demonstrate that the deexcitation rates (and thus the quantum yield) did depend on the size of the pi conjugation. For instance, if the authors had been able to accurately predict pi-conjugation length using the experimental deexcitation rates, then they could have then flipped that and predicted quanum yield from the size. Instead, there’s just a stupid plot that doesn’t make any sense.

So this paper wins an EDSEL Award for the worst paper I’ve read in JACS. I have no idea how that even got past the editors, saying nothing of the reviewers! That said, I am willing to admit my ability to be totally wrong. If so, I apologize to everyone. Please let me know if I made any mistakes.

Next Page >

Powered by WordPress, Theme Based on "Pool" by Borja Fernandez
Entries and comments feeds. Valid XHTML and CSS.
^Top^