unbelievably small P values?

November 18, 2019 at 9:56 am | | literature, scientific integrity

Check out our newest preprint at arXiv:

If your P value looks too good to be true, it probably is: Communicating reproducibility and variability in cell biology

Lord, S. J.; Velle, K. B.; Mullins, R. D.; Fritz-Laylin, L. K. arXiv 2019, 1911.03509. https://arxiv.org/abs/1911.03509

I’ve noticed a promising trend away from bar graphs in the cell biology literature. That’s great, because reporting simply the average and SD or SEM or an entire dataset conceals a lot of information. So it’s nice to see column scatter, beeswarm, violin, and other plots that show the distribution of the data.

But a concerning outcome of this trend is that, when authors decide to plot every measurement on every cell as a separate datapoint, it seems to trick people into thinking that each cell is an independent sample. Clearly, two cells in the same flask treated with a drug are not independent tests of whether the drug works: there are many reasons the cells in that particular flask might be different from those in other flasks. To really test a hypothesis that the drug influences the cells, one must repeat the drug treatment multiple times and check if the observed effect happens repeatably.

I scanned the latest issues of popular cell biology journals and found that over half the papers counted each cell as a separate N and calculated P values and SEM using that inflated count.

Notice that bar graphs—and even beeswarm plots—fail to capture the sample-to-sample variability in the data. This can have huge consequences: in C, the data is really random, but counting each cell as its own independent sample results in minuscule error bars and a laughably small P value.

But that’s not to say the the variability cell-to-cell is unimportant! The fact that some cells in a flask react dramatically to a treatment and others carry on just fine might have very important implications in an actual body.

So we proposed “SuperPlots,” which superimpose sample-to-sample summary data on top of the cell-level distribution. This is a simple way to convey both variability of the underlying data and the repeatability of the experiment. It doesn’t really require any complicated plotting or programming skills. On the simplest level, you can simply paste two (or more!) plots in Illustrator and overlay them. Play around with colors and transparency to make it visually appealing, and you’re done! (We also give a tutorial on how we made the plots above in Graphpad Prism.)

Let me know what you think!

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

thanks for the comment

Powered by WordPress, Theme Based on "Pool" by Borja Fernandez
Entries and comments feeds. Valid XHTML and CSS.
^Top^