avoiding bias by blinding

July 3, 2020 at 11:15 am | | literature, scientific integrity

Key components of the scientific process are: controls, avoiding bias, and replication. Most scientists are great at controls, but without the other two, we’re simply not doing science.

The lack independent samples (and thus improper inflation of n) and the failure to blind experiments are too common. The implications of these mistakes, especially when combined in one study, mean that many published cell biology results are likely artifact. Generating large datasets, even with a slight bias, can quickly yield “significant” results out of noise. For example, see the “NHST is unsuitable for large datasets” section of:

Szucs D, Ioannidis JPA. When Null Hypothesis Significance Testing Is Unsuitable for Research: A ReassessmentFront Hum Neurosci. 2017;11:390. https://pubmed.ncbi.nlm.nih.gov/28824397/

Now combine these common false positives with the inclination to publish flashy results, and we’ve made a recipe for unreliable scientific literature.

I do not condemn authors for these problems. Most of us have made one or all of these mistakes. I have. And I probably will again in the future. Science is hard. There is no shame in making honest mistakes. But we can all strive to be better (see my last section).

Failing to perform the data collection blinded

Blinding is just basic scientific rigor, and skipping this should be considered almost as bad as skipping controls.

Blinding samples during data collection and analysis is ideal. For data collection, it is usually as simple as having a labmate put tape over labels on your samples and label them with a dummy index. Insist that your coworker writes down the key, so later you can decode the dummy index back to the true sample information.

In cases where true blinding is impractical, the selection of cells to image/collect should be randomized (e.g. set random coordinates for the microscope stage) or otherwise designed to avoid bias (e.g. selecting cells using transmitted light if fluorescence the readout).

Failing to perform the data analysis blinded

Blinding during data analysis is generally very practical, even when the original data was not collected in a bias-free fashion. Ideally, image analysis would be done entirely by algorithms and computers, but often the most practical and effective approach is old-fashioned human eye. Ensuring your manual analysis isn’t biased is usually as simple as scrambling the image filenames.

I stumbled upon these ImageJ macros for randomizing and derandomizing image filenames, written by Martin Höhne: http://imagej.1557.x6.nabble.com/Macro-for-Blind-Analyses-td3687632.html

More recently, Christophe Leterrier directed me to Steve Royle‘s macro, which works very well: https://github.com/quantixed/imagej-macros#blind-analysis

There are probably some excellent solution using Python. Regardless of the approach you take, I would highly recommend copying all the data to a new folder before you perform any filename changes. Then test the program forward and backwards to confirm everything works as expected. Maybe perform analysis in batches, so in case something goes awry, you don’t lose all that work.

My “blind and dumb” experiment

There are many stories about unintended bias leading to false conclusions. Here’s mine: I was testing to see whether a drug treatment inhibited cells from crawling through a porous barrier by counting the number of cells that made it through the barrier to an adjacent well.

My partner in crime had labeled the samples with dummy indices, so I didn’t know which wells were treated and which were control. But I immediately could tell that there were more cells in one set of wells, so I presumed those were the control set. Fortunately, I had taken the extra precaution of randomizing the stage positions, so I didn’t let my bias alter the data collection. We then blinded the analysis by relabeling the microscopy images. I manually counted all the cells in each image.

We then unblinded the samples. At first, we were disappointed that the wells I had assumed were control turned out to be treated. Then we looked at the results. SURPRISE! My snap judgement at the beginning of the experiment had been precisely backwards: the wells I thought looked like they had sparser cells actually had significantly more on average. So it turned out that the drug treatment had indeed worked. Thankfully, I didn’t rely on my snap judgement nor allow that bias to influence the results.

Treating each cell as an n in the statistical analysis

This error plagues the majority of cell biology papers. Go scan a recent issue of your favorite journal and count the number of papers that have minuscule P values; invariably, the authors aggregated all the cell measurements from multiple experiments and calculated the t-test or ANOVA based on those dozens or hundreds of measurements. This is a fatal error.

It is patently absurd to consider neighboring cells in the same dish all treated simultaneously with the same drug as independent tests of a hypothesis.

If your neighbor told you that he ate a banana peal and it reversed his balding, you might be a little skeptical. If he further explained that he measured 1000 of his hair follicles before and after eating a banana peel and measured a P < 0.05 difference in growth rate, would you be convinced? Maybe it was just a fluke or noise that his hairs started growing faster. You would want him to repeat the experiment a few times (maybe even with different people) before you started believing.

Similarly, there are many reasons two dishes of cells might be different. To start believing that a treatment is truly effective, we all understand that we should repeat the experiment a few times and get similar results. Counting each cell measurement as the sample size n all but guarantees a small—but meaningless—P value.

Observe how dramatically different scenarios (on the right) yield the same plot and P value when you assume each cell is a separate n (on the left):

Elegant solutions include “hierarchical” or “nested” or “mixed effect” statistics. A simple approach is to separately pool the cell-level data from each experiment, then compare experiment-level means (n the case above, the n for each condition would be 3, not 300). For more details, please read my previous blog post or our paper:

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK. SuperPlots: Communicating reproducibility and variability in cell biology. J Cell Biol. 2020;219(6):e202001064.
https://pubmed.ncbi.nlm.nih.gov/32346721

How do we fix this?

Professors need to teach their trainees the very basics about how to design experiments (see Stan Lazic’s book: Experimental Design for Laboratory Biologists) and perform analysis (see Mike Whitlock and Dolph Schluter’s book: The Analysis of Biological Data). PIs need to provide researchers with the tools to blind their experiments or otherwise remove bias. They need to ask for multiple biological replicates and correctly calculated P values. This does not require advanced understanding of statistics, just the basic understanding of the importance of repeating an experiment multiple times to ensure an observation is real.

Editors and referees need to demand correct data analysis. While asking researchers to redo an experiment isn’t really acceptable, requiring a reanalysis of the data after blinding or recalculating P values based on biological replicates seems fair. Editors should not even send manuscripts to referees if the above errors are not corrected or at least addressed in some fashion. Editors can offer the simple solutions listed above.

unbelievably small P values?

November 18, 2019 at 9:56 am | | literature, scientific integrity

Check out our newest preprint at arXiv:

If your P value looks too good to be true, it probably is: Communicating reproducibility and variability in cell biology

Lord, S. J.; Velle, K. B.; Mullins, R. D.; Fritz-Laylin, L. K. arXiv 2019, 1911.03509. https://arxiv.org/abs/1911.03509

UPDATE: Now published in JCB: https://doi.org/10.1083/jcb.202001064

I’ve noticed a promising trend away from bar graphs in the cell biology literature. That’s great, because reporting simply the average and SD or SEM or an entire dataset conceals a lot of information. So it’s nice to see column scatter, beeswarm, violin, and other plots that show the distribution of the data.

But a concerning outcome of this trend is that, when authors decide to plot every measurement or every cell as a separate datapoint, it seems to trick people into thinking that each cell is an independent sample. Clearly, two cells in the same flask treated with a drug are not independent tests of whether the drug works: there are many reasons the cells in that particular flask might be different from those in other flasks. To really test a hypothesis that the drug influences the cells, one must repeat the drug treatment multiple times and check if the observed effect happens repeatably.

I scanned the latest issues of popular cell biology journals and found that over half the papers counted each cell as a separate N and calculated P values and SEM using that inflated count.

Notice that bar graphs—and even beeswarm plots—fail to capture the sample-to-sample variability in the data. This can have huge consequences: in C, the data is really random, but counting each cell as its own independent sample results in minuscule error bars and a laughably small P value.

But that’s not to say the the variability cell-to-cell is unimportant! The fact that some cells in a flask react dramatically to a treatment and others carry on just fine might have very important implications in an actual body.

So we proposed “SuperPlots,” which superimpose sample-to-sample summary data on top of the cell-level distribution. This is a simple way to convey both variability of the underlying data and the repeatability of the experiment. It doesn’t really require any complicated plotting or programming skills. On the simplest level, you can simply paste two (or more!) plots in Illustrator and overlay them. Play around with colors and transparency to make it visually appealing, and you’re done! (We also give a tutorial on how we made the plots above in Graphpad Prism.)

Let me know what you think!

UPDATE: We simplified the figure:

Figure 1. Drastically different experimental outcomes can result in the same plots and statistics unless experiment-to-experiment variability is considered. (A) Problematic plots treat N as the number of cells, resulting in tiny error bars and P values. These plots also conceal any systematic run- to-run error, mixing it with cell-to-cell variability. To illustrate this, we simulated three different scenarios that all have identical underlying cell-level values but are clustered differently by experiment: (B) shows highly repeatable, unclustered data, (C) shows day-to-day variability, but a consistent trend in each experiment, and (D) is dominated by one random run. Note that the plots in (A) that treat each cell as its own N fail to distinguish the three scenarios, claiming a significant difference after drug treatment, even when the experiments are not actually repeatable. To correct that, “SuperPlots” superimpose summary statistics from biological replicates consisting of independent experiments on top of data from all cells, and P values were calculated using an N of three, not 300. In this case, the cell-level values were separately pooled for each biological replicate and the mean calculated for each pool; those three means were then used to calculate the average (horizontal bar), standard error of the mean (error bars), and P value. While the dot plots in the “OK” column ensure that the P values are calculated correctly, they still fail to convey the experiment-to-experiment differences. In the SuperPlots, each biological replicate is color-coded: the averages from one experimental run are yellow dots, another independent experiment is represented by gray triangles, and a third experiment is shown as blue squares. This helps convey whether the trend is observed within each experimental run, as well as for the dataset as a whole. The beeswarm SuperPlots in the rightmost column represent each cell with a dot that is color coded according to the biological replicate it came from. The P values represent an unpaired two-tailed t-test (A) and a paired two-tailed t-test for (B-D). For tutorials on making SuperPlots in Prism, R, Python, and Excel, see the supporting information.

electrically tunable lenses for microscopy

September 2, 2016 at 2:22 pm | | hardware, literature

Electrically tunable lenses (ETLs) are polymeric or fluid-filled lenses that have a focal length that changes with an applied current. They have shown some great potential for microscopy, especially in fast, simple z-sweeps.

etlens

etl z stack

The above figure shows the ~120 um range of focal depths an ETL installed between the camera and a 40x objective (from reference 1). Note that this arrangement has the drawback of changing the effective magnification at different focal depths; however, this effect is fairly small (20%) and linear over the full range. For high-resolution z-stack imaging of cells, this mag change would not be ideal. But it should be correctable for imaging less sensitive to magnification changes. Basic ETLs cost only a few hundred dollars, a lot cheaper than a piezo stage or objective focuser. Optotune has a lot of information about how to add an ETL to a microscope.

Another cool application of an ETL is in light-sheet microscopy. A recent paper from Enrico Gratton (reference 2) used an ETL to sweep the narrow waist of a light sheet across the sample, and synchronize its motion to match the rolling shutter of a CMOS camera.

etl light sheet

The main goal was to cheaply and simply create a light sheet that had a uniform (and minimal) thickness across the entire field of view. Previous low-tech methods to achieve this was to close down an iris, thus reducing the difference in thickness across the sample, but it also reduces the minimal waist size. The high-tech way to do this is creating “propagation-invariant” Bessel or Airy beams. These do not spread out as they propagate, like Gaussian beams do, but creating them and aligning them in microscopes is significantly more challenging.

etl light sheet 2

Gratton’s cheap trick means one can create a flat and thin light sheet for the cost of an ETL and the complexity of synchronizing a voltage ramp signal to the CMOS rolling shutter readout. To be honest, I don’t 100% know how complicated or robust that is in practice. I’m just guessing that it’s simpler than a Bessel beam.


  1. Wang, Z., Lei, M., Yao, B., Cai, Y., Liang, Y., Yang, Y., … Xiong, D. (2015). Compact multi-band fluorescent microscope with an electrically tunable lens for autofocusing. Biomedical Optics Express, 6(11), 4353. doi:10.1364/BOE.6.004353

  2. Hedde, P. N., & Gratton, E. (2016). Selective plane illumination microscopy with a light sheet of uniform thickness formed by an electrically tunable lens. Microscopy Research and Technique, 00(April). doi:10.1002/jemt.22707

experimenting with preprints

May 9, 2016 at 12:15 pm | | literature, science community

We recently uploaded a preprint to bioRxiv. The goal was to hopefully get some constructive feedback to improve the manuscript. So far, it got some tweets and even an email from a journal editor, but no comments or constructive feedback.

I notice that very few preprints on bioRxiv have any comments at all. Of course, scientists may be emailing each other privately about papers on bioRxiv, and that would be great. But I think a open process would be valuable. F1000Research, for example, has a totally open review process, posting the referee reports right with the article. I might be interested in trying that journal someday.

UPDATE: In the end, we did receive a couple emails from students who had read the preprint for their journal club. They provided some nice feedback. Super nice! We also received some helpful feedback on another preprint, and we updated the manuscript before submitting to a journal. Preprints can be super useful for pre-peer review.

speck of dust

April 18, 2014 at 10:43 am | | crazy figure contest, history, literature

The scope room dustiness post reminded me of the hilarious story of the first report of second harmonic generation (SHG) of a laser. The authors presented a photographic plate that showed the exposure from the main laser beam, as well as a “small but dense” spot from the doubled beam,

shg dust

See the spot? You won’t. Because the editor removed the spot, thinking it was a speck of dust on the plate. Ha!

When I first heard this story, I didn’t believe it. I assumed it was a contrast issue when the paper was scanned into a PDF. So I went to the library and found the original print version. No spot there, either!

That really made my day.

am i finished using Papers?

April 4, 2014 at 1:39 pm | | literature, software

I’ve been using Papers for years. When Papers2 came out, I was quick (too quick) to jump in and start using it. It’s worst bugs got ironed out within a couple months, and I used it happily for a while. Papers2 would let you sync PDFs to your iPad for offline reading, but it was slow and a little clunky. Papers3 library syncing is not for offline reading and it is VERY slow and VERY clunky. And it relies on Dropbox for storage. The plus of this is that storage is free (as long as you have space in Dropbox); the downside is that they syncing isn’t clean and often fails.

Mendeley has proven itself the best at syncing your library and actual PDFs to the cloud (you have to pre-download individual files for offline reading you can sync all PDFs in iOS in settings). Papers PDF viewer is still better, but it’s not worth the hassle: Mendeley syncs cleanly and the reader is fine. Not only that, but Mendeley has sharing options that make managing citations possible when writing a manuscript with co-authors (as long as they’ll use Mendeley).

Mendeley is also better than Papers at automatically finding the metadata for the paper (authors, title, abstract, etc.). The program simply works (most of the time), so I’ve given up and finally started using it. Almost exclusively.

PubChase syncs with Mendeley and recommends related papers weekly. (Update: the recommendations update daily, and they send out a weekly email with updates from that week.) They also have some pretty nice features, like a beautiful viewer for some journals and alerts when papers in your library are retracted.

Readcube still has the best recommendations. And they update daily, unlike PubChase’s weekly. And you can tell which recommendations you’ve marked as read, so it’s very quick to scan the list. But that’s really where Readcube’s benefits end. The enhanced PDF viewing feature is nice (it shows all the reference in the sidebar), but not really worth the slow-down in scrolling performance. The program is just clunky still. (I thought Adobe was slow!) And there’s no iOS/Android app yet. It’s on its way, allegedly, but I need it now! Readcube is really taking off, so maybe in a year it will be perfect. But not yet.

Edit: Readcube has a new version of their desktop application. Maybe it’s faster? Wait, did the references sidebar disappear? No, wait, it’s there. Just not on by default.

readcube and deepdyve update

June 6, 2013 at 7:48 am | | literature, science community, software

I just wanted to reiterate how great the ReadCube recommendations are. I imported all my PDFs and now check the recommendations every day. I often find great papers (and then later find them popping up in my RSS feeds).

Also, I wanted to let folks know that DeepDyve, the article rental site, is now allowing free 5-min rental of journal articles. Try it out!

PubReader review

April 14, 2013 at 7:52 pm | | literature, software

I’ve reviewed several PDF reader/organizers, like ReadCube, Papers, and Mendeley. Currently, I use Papers for organizing my PDF library on my computer. I also like Papers a lot for reading PDFs, because it displays in full screen so well. But I’ve started using Mendeley for adding citations to Word documents, because it makes it really easy to collaborate with other people who have Mendeley.

Now check out PubReader! It’s really cool. Pubmed has the advantage that it requires all research publications resulting from NIH funding to be uploaded to their depository. And they don’t just grab a PDF; they get the raw text and figures and they format it their own way. I used to think that was silly and overkill, but now I see that that approach was genius: it now allows Pubmed to reformat the papers into more readable shapes and sizes … and they can reformat in the future when the old format becomes antiquated. You can’t really do that with a PDF.

It’s always been nearly impossible to read PDFs on a phone or an e-ink tablet like the basic Kindle. Now, with PubReader and the beta option to download the article in an ePub format (for reading in iBooks or Kindle or something), that option is here. Or on its way, at least.

PubReader on a computer:

pubreader

PubReader on iPad:

pubreader on ipad

ePub in iBooks:

ebook epub

Now PubReader just needs to display the references in an elegant way like ReadCube, and it will be the best!

It makes me think the future of reading and storing scientific papers is not the hard drive, but simply reading on online depositories. Pubmed allows you to create collection and star favorites, so you can just use Pubmed to store your collection of papers and never have to download a PDF again in your life!

readcube review

April 10, 2013 at 11:45 am | | literature, software

I recently tried Readcube, which is a PDF reader and organizer. I did so because Nature has been using it built into their site, and I like how it displaying PDFs. The article data downloads seamlessly for most papers, and  interface is quite beautiful:

readcube1

The really cool feature is that Readcube automatically downloads the references and the supporting information documents and can display them at a click of a button. More importantly, it displays the references in the sidebar. It makes an excellent reading experience!

readcube2 readcube3

The final interesting feature is that Readcube offers recommendations based on your library. From my quick scan, the recommendations seem pretty good.

Other than that, Readcube is quite feature poor. It doesn’t have a way to insert citations into a Word document, like Papers and Mendeley does, although you can export to Endnote. I don’t see a way to read in full screen nor does it let you view two pages simultaneously, like Papers does.

papers fullscreen

The screenshot above is from Papers fullscreen view, which is how I really like to read PDFs.

But Readcube is still in beta, and they’re starting from a really nice starting point. I’m not ready to give up on Papers for reading (and I’ve been using Mendeley for Word citations, because it has really nice collaborative features). But I might try Readcube some more, mainly because of the awesome ability to see all the references and the paper simultaneously. I really wish I could mash Papers, Mendeley, and Readcube all together into one feature-rich program…

ActiveView PDF

April 10, 2013 at 10:38 am | | everyday science, literature, news

Does anyone else love ACS’s ActiveView PDF viewer for reading PDFs and seeing reference? And Nature’s ReadCube, too. Great stuff.

Of course, after I scan the ActiveView, I still download the old-fashioned PDF and use Papers (or Mendeley) to read and manage my library.

google reader alternatives

April 3, 2013 at 8:12 am | | everyday science, literature, science community, software

Now that Google Reader is going the way of the dodo Google Gears, how am I going to keep up with the literature?!? I read RSS feeds of many journal table of contents, because it’s one of the best ways to keep up with all the articles out there (and see the awesome TOC art). So what am I to do?

There are many RSS readers out there (one of my favorites was Feeddler for iOS), but the real problem is syncing! Google servers took care of all the syncing when I read RSS feeds on my phone and then want to continue reading at home on my computer. The RSS readers out there are simply pretty faces on top of Google Reader’s guts.

But now those RSS programs are scrambling to build their own syncing databases. Feedly, one of the frontrunners to come out of the Google Reader retirement, claims that their project Normandy will take care of everything seamlessly. Reeder, another very popular reader, also claims that syncing will continue, probably using Feedbin. Feeddler also says they’re not going away, but with no details. After July 1, we’ll see how many of these programs actually work!

So what am I doing? I’ve tried Feedly and really like how pretty it is and easy it is to use. The real problem with Feedly is that its designed for beauty, not necessarily utility. For instance look how pretty it displays on my iPad:

feedly

But note that its hard to distinguish the journal from the authors and the abstract. And it doesn’t show the full TOC image. Feedly might be faster (you can swipe to move to the next articles), but you may not get as much full information in your brain and might miss articles that might actually interest you.

Here’s Reeder, which displays the title, journal, authors, and TOC art all differently, making it easy to quickly scan each  article:

reeder

 

And Feeddler:

feeddler

I love that Feeddler lets me put the navigation arrow on the bottom right or left, and that it displays a lot of information in nice formatting for each entry. That way, I can quickly flip through many articles and get the full information. The major problem is that it doesn’t have a Mac or PC version, so you’ll be stuck on your phone.

I think I’ll drop Feeddler and keep demoing Reedler and Feedly until July 1 rolls around.

urine biophysics

October 17, 2012 at 1:50 pm | | literature

The Shape of the Urine Stream — From Biophysics to Diagnostics.

Definitely an Ig Nobel contender.

Atoms and Molecules – A Child’s Guide to Chemistry

June 27, 2012 at 2:22 pm | | literature, science and the public, teaching

My labmate wrote a chemistry book for children … and his daughter did the illustrations. It succinctly describes atoms, orbitals, bonding, molecules, and biomolecules.

I highly recommend it.

the matrix begins

June 27, 2012 at 10:56 am | | literature

Implanted Biofuel Cell Operating in a Living Snail.”

Implantable biofuel cells have been suggested [BY MACHINES] as sustainable micropower sources operating in living organisms, but such bioelectronic systems are still exotic and very challenging to design.

One thing I never understood about the Matrix was how the machines were getting more power in electricity out of the human farms than they had to put in as food. Don’t the machines know the three laws of thermodynamics? Or just the three laws of robotics?

PeerJ

June 8, 2012 at 9:39 am | | literature, science and the public, science community

This is an interesting idea. PeerJ sounds like it’s going to be an open access journal, with a cheap publication fee ($99 for a lifetime membership). I wonder if it will be selective?

I’m more excited about HHMI’s new journal eLife.

Next Page >

Powered by WordPress, Theme Based on "Pool" by Borja Fernandez
Entries and comments feeds. Valid XHTML and CSS.
^Top^