Everyday Scientist

Checklist for Experimental Design

October 25, 2021 at 9:42 am | sam | everyday science, scientific integrity

One of the worst feelings as a scientist is to realize that you performed n-1 of the controls you needed to really answer the question you’re trying to solve.

My #1 top recommendation when starting a new set of experiments is to write down a step-by-step plan. Include what controls you plan to do and how many replicates. Exact details like concentrations are less important than the overarching experimental plan. Ask for feedback from your PI, but also from a collaborator or labmate.

Here are some things I consider when starting an experiment. Feel free to leave comments about things I’ve missed.

Randomization and Independence

Consider what “independent” means in the context of your experiment.
Calculate p-value using independent samples (biological replicates), not the total number of cells, unless you use advanced hierarchical analyses (see: previous blog post and Lord et al.: https://doi.org/10.1083/jcb.202001064).
Pairing or “blocking” samples (e.g. splitting one aliquot into treatment and control) helps reduce confounding parameters, such as passage number, location in incubator, cell confluence, etc.

Power and Statistics

Statistical power is to the ability to distinguish small but real differences between conditions/populations.
Increasing the number of independent experimental rounds (AKA “biological replicates”) typically has a much larger influence on power than the cells or number of measurements per sample (see: Blainey et al.: https://doi.org/10.1038/nmeth.3091).
If the power of an assay is known, you can calculate the number of samples required to be confident you will be able to observe an effect.
Consider preregistering
- Planning the experimental and analysis design before starting the experiment substantially reduces the chances false positives.
- Although formal preregistration is not typically required for cell biology studies, simply writing the plan down for yourself in your notebook is far better than winging it as you go.
- Plan the number of biological replicates before running statistical analysis. If you instead check if a signal is “significant” between rounds of experiment and stop when p < 0.05, you’re all but guaranteed to find a false result.
- Similarly, don’t try several tests until you stumble upon one that gives a “significant” p-value.

Blocking

“Blocking” means to subdivide samples into similar units and running those sets together. For example, splitting a single flask of cells into two, one treatment and one control.
Blocking can help reveal effects even when experimental error or sample-to-sample variability is large.
Blocked analyses include paired t-test or normalizing the treatment signal to the control within each block.

Controls

Sledgehammer controls
- These controls are virtually guaranteed to give you zero or full signal, and are a nice simple test of the system.
- Examples include wild type cells, treating with DMSO, imaging autofluorescence of cells not expressing GFP, etc.
Subtle controls
- These controls are more subtle than the strong controls, and might reveal some unexpected failures.
- Examples include: using secondary antibody only, checking for bleed-through and crosstalk between fluorescence channels, and using scrambled siRNA.
Positive & negative controls
- The “assay window” is a measure of the range between the maximum expected signal (positive control) and the baseline (negative control).
- A quantitative measure of the assay window could be a standardized effect size, like Cohen’s d, calculated with multiple positive and negative controls.
- In practice, few cell biologist perform multiple control runs before an experiment. So a qualitative estimate of the assay window should be considered using the expected signal and expected variability sample to sample. In other words, consider carefully if an experiment can possibly work
Concurrent vs historical controls
- Running positive and/or negative control in the same day’s experimental run as the samples that receive real treatment helps eliminate additional variability.
Internal controls
- “Internal” controls are cells within the same sample that randomly receive treatment or control. For example, during a transient transfection, only a portion of the cells may actually end up expressing, while those that aren’t can act as a negative control.
- Because cells with the same sample experience the same perturbations (such as position in incubator, passage number, media age) except for the treatment of interest, internal controls can remove many spurious variables and make analysis more straightforward.

Bias

Blinding acquisition
- Often as simple as having a labmate put tape over labels on your samples and label them with a dummy index. Confirm that your coworker actually writes down the key, so later you can decode the dummy index back to the true sample information.
- In cases where true blinding is impractical, the selection of cells to image/collect should be randomized (e.g. set random coordinates for the microscope stage) or otherwise designed to avoid bias (e.g. selecting cells using transmitted light or DAPI).
Blinding analysis
- Ideally, image analysis would be done entirely by algorithms and computers, but often the most practical and effective approach is old-fashioned human eye.
- Ensuring your manual analysis isn’t biased is usually as simple as scrambling filenames. For microscopy data, Steve Royle‘s macro, which works well: https://github.com/quantixed/imagej-macros#blind-analysis
- I would highly recommend copying all the data to a new folder before you perform any filename changes. Then test the program forward and backwards to confirm everything works as expected. Maybe perform analysis in batches, so in case something goes awry, you don’t lose all that work.

Resources

Great primer on experimental design and analysis, especially for the cell biologist or microscopist: Stephen Royle, “The Digital Cell: Cell Biology as a Data Science” https://cshlpress.com/default.tpl?action=full&–eqskudatarq=1282

Advanced, detailed (but easily digestible) book on experimental design and statistics: Stanley Lazic, “Experimental Design for Laboratory Biologists” https://stanlazic.github.io/EDLB.html

I like this very useful and easy-to-follow stats book: Whitlock & Schluter, “The Analysis of Biological Data” https://whitlockschluter.zoology.ubc.ca

Alex Reinhart, “Statistics Done Wrong”

Tomek & Eisner, “Basic Statistics for Life Scientists”

SuperPlots: Communicating reproducibility and variability in cell biology. (HTML, PDF)
Lord, S. J.; Velle, K. B.; Mullins, R. D.; Fritz-Laylin, L. K. J. Cell Biol.2020, 219(6), e202001064.

| 1 Comment |

Replace Peer Review with “Peer Replication”

October 13, 2021 at 1:35 pm | sam | literature, science and the public, science community, scientific integrity

UPDATE: EMBO Reports article here and a write-up in Nature here.

As I’ve posted before and many others have noted, there is a serious problem with lack of adequate replication in many fields of science. The current peer review process is a dreadful combination of being both very fallible and also a huge hurdle to communicating important science.

Instead of waiting for a few experts in the field to read and apply their stamp of approval to a manuscript, the real test of a paper should be the ability to reproduce its findings in the real world. (As Andy York has pointed out, the best test of a new method is not a peer reviewing your paper, but a peer actually using your technique.) But almost no published papers are subsequently replicated by independent labs, because there is very little incentive for anyone to spend time and resources testing an already published finding. That is precisely the opposite of how science should ideally operate.

Let’s Replace Traditional Peer Review with “Peer Replication”

Instead of sending out a manuscript to anonymous referees to read and review, preprints should be sent to other labs to actually replicate the findings. Once the key findings are replicated, the manuscript would be accepted and published.

(Of course, as many of us do with preprints, authors can solicit comments from colleagues and revise a manuscript based on that feedback. The difference is that editors would neither seek such feedback nor require revisions.)

Along with the original data, the results of the attempted replication would be presented, for example as a table that includes which reagents/techniques were identical. The more parameters that are different between the original experiment and the replication, the more robust the ultimate finding if the referees get similar results.

*A purely hypothetical example of the findings after referees attempt to replicate. Of course in reality, my results would always have green checkmarks.*

Incentives

What incentive would any professor have to volunteer their time (or their trainees’ time) to try to reproduce someone else’s experiment? Simple: credit. Traditional peer review requires a lot of time and effort to do well, but with zero reward except a warm fuzzy feeling (if that). For papers published after peer replication, the names of researchers who undertook the replication work will be included in the published paper (on a separate line). Unlike peer review, the referees will actually receive compensation for their work in the form of citations and another paper to include on their CV.

Why would authors be willing to have their precious findings put through the wringer of real-world replication? First and foremost, because most scientists value finding truth, and would love to show that their findings hold up even after rigorous testing. Secondly, the process should actually be more rewarding than traditional peer review, which puts a huge burden on the authors to perform additional experiments and defend their work against armchair reviewers. Peer replication turns the process on its head: the referees would do the work of defending the manuscript’s findings.

Feasible Experiments

There are serious impediments to actually reproducing a lot of findings that use seriously advanced scientific techniques or require long times or a lot of resources (e.g. mouse work). It will be the job of editors—in collaboration with the authors and referees—to determine the set of experiments that will be undertaken, balancing rigor and feasibility. Of course, this might leave some of the most complex experiments unreplicated, but then it would be up to the readers to decide for themselves how to judge the paper as a whole.

What if all the experiments in the paper are too complicated to replicate? Then you can submit to JOOT.

Ancillary Benefits

Peer replication transforms the adversarial process of peer review into a cooperation among colleagues to seek the truth. Another set of eyes and brains on an experiment could introduce additional controls or alternative experimental approaches that would bolster the original finding.

This approach also encourages sharing experimental procedures among labs in a manner that can foster future collaborations, inspire novel approaches, and train students and postdocs in a wider range of techniques. Too often, valuable hands-on knowledge is sequestered in individual labs; peer replication would offer an avenue to disseminate those skills.

Peer replication would reduce fraud. Often, the other authors on an ultimately retracted paper only later discover that their coworker fabricated data. It would be nearly impossible for a researcher to pass off fabricated data or manipulated images as real if other researchers actually attempt to reproduce the experimental results.

Potential Problems

One serious problem with peer replication is the additional time it may take between submission and ultimate publication. On the other hand, it often takes many months to go through the traditional peer review process, and replicating experiments may not actually add any time in many cases. Still this could be mitigated by authors submitting segments of stories as they go. Instead of waiting until the entire manuscript is polished, authors or editors could start arranging replications while the manuscript is still in preparation. Ideally, there would even be a journal-blind mechanism (like ReviewCommons) to arrange reproducing these piecewise findings.

Another problem is what to do when the replications fail. There would still need to be a judgement call as to whether the failed replication is essential to the manuscript and/or if the attempt at replication was adequately undertaken. Going a second round at attempting a replication may be warranted, but editors would have to be wary of just repeating until something works and then stopping. Pre-registering the replication plan could help with that. Also, including details of the failed replications in the published paper would be a must.

Finally, there would still be the problem of authors “shopping” their manuscript. If the replications fail and the manuscript is rejected, the authors could simply submit to another journal. I think the rejected papers would need to be archived in some fashion to maintain transparency and accountability. This would also allow some mechanism for the peer replicators to get credit for their efforts.

Summary of Roles:

Editor:
- Screen submissions and reject manuscripts with obviously flawed science, experiments not worth replicating, essential controls missing, or seriously boring results.
- Find appropriate referees.
- With authors and referees, collaboratively decide which experiments the referees should attempt to replicate and how.
- Ultimately conclude, in consultation with referees, whether the findings in the papers are sufficiently reproducible to warrant full publication.
Authors:
- Write the manuscript, seek feedback (e.g. via bioRxiv), and make revisions before submitting to the journal.
- Assist referees with experimental design, reagents, and even access to personnel or specialized equipment if necessary.
Referees:
- Faithfully attempt to reproduce the experimental results core to the manuscript.
- Optional: Perform any necessary additional experiments or controls to close any substantial flaws in the work.
- Collate results.
Readers:
- Read the published paper and decide for themselves if the evidence supports the claims, with the confidence that the key experiments have been independently replicated by another lab.
- Cite reproducible science.

How to Get Started

While it would be great if a journal like eLife simply piloted a peer replication pathway, I don’t think we can wait for Big Publication to initiate the shift away from traditional peer review. Maybe the quickest route would be for an organization like Review Commons to organize a trial of this new approach. They could identify some good candidates from bioRxiv and, with the authors, recruit referees to undertake the replications. Then the entire package could be shopped to journals.

I suspect that once scientists see peer replication in print, it will be hard to take seriously papers vetted only by peer review. Better science will outcompete unreproduced findings.

(Thanks Arthur Charles-Orszag for the fruitful discussions!)

| 5 Comments |