IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0187132.html
   My bibliography  Save this article

Statistical analysis of co-occurrence patterns in microbial presence-absence datasets

Author

Listed:
  • Kumar P Mainali
  • Sharon Bewick
  • Peter Thielen
  • Thomas Mehoke
  • Florian P Breitwieser
  • Shishir Paudel
  • Arjun Adhikari
  • Joshua Wolfe
  • Eric V Slud
  • David Karig
  • William F Fagan

Abstract

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson’s correlation coefficient (r) and Jaccard’s index (J)–two of the most common metrics for correlation analysis of presence-absence data–can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (

Suggested Citation

  • Kumar P Mainali & Sharon Bewick & Peter Thielen & Thomas Mehoke & Florian P Breitwieser & Shishir Paudel & Arjun Adhikari & Joshua Wolfe & Eric V Slud & David Karig & William F Fagan, 2017. "Statistical analysis of co-occurrence patterns in microbial presence-absence datasets," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-21, November.
  • Handle: RePEc:plo:pone00:0187132
    DOI: 10.1371/journal.pone.0187132
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0187132
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0187132&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0187132?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Griffith, Daniel M. & Veech, Joseph A. & Marsh, Charles J., 2016. "cooccur: Probabilistic Species Co-Occurrence Analysis in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 69(c02).
    2. Charles K Lee & Craig W Herbold & Shawn W Polson & K Eric Wommack & Shannon J Williamson & Ian R McDonald & S Craig Cary, 2012. "Groundtruthing Next-Gen Sequencing for Microbial Ecology–Biases and Errors in Community Structure Estimates from PCR Amplicon Pyrosequencing," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-12, September.
    3. Niels Klitgord & Daniel Segrè, 2010. "Environments that Induce Synthetic Microbial Ecosystems," PLOS Computational Biology, Public Library of Science, vol. 6(11), pages 1-17, November.
    4. Tanja Woyke & Hanno Teeling & Natalia N. Ivanova & Marcel Huntemann & Michael Richter & Frank Oliver Gloeckner & Dario Boffelli & Iain J. Anderson & Kerrie W. Barry & Harris J. Shapiro & Ernest Szeto , 2006. "Symbiosis insights through metagenomic analysis of a microbial consortium," Nature, Nature, vol. 443(7114), pages 950-955, October.
    5. Julia Oh & Allyson L. Byrd & Clay Deming & Sean Conlan & Heidi H. Kong & Julia A. Segre, 2014. "Biogeography and individuality shape function in the human skin metagenome," Nature, Nature, vol. 514(7520), pages 59-64, October.
    6. Peter J. Turnbaugh & Ruth E. Ley & Micah Hamady & Claire M. Fraser-Liggett & Rob Knight & Jeffrey I. Gordon, 2007. "The Human Microbiome Project," Nature, Nature, vol. 449(7164), pages 804-810, October.
    7. Alexander Shapiro & Jos Berge, 2002. "Statistical inference of minimum rank factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 79-94, March.
    8. Charles K Fisher & Pankaj Mehta, 2014. "Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries Using Sparse Linear Regression," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-10, July.
    9. Tanya Yatsunenko & Federico E. Rey & Mark J. Manary & Indi Trehan & Maria Gloria Dominguez-Bello & Monica Contreras & Magda Magris & Glida Hidalgo & Robert N. Baldassano & Andrey P. Anokhin & Andrew C, 2012. "Human gut microbiome viewed across age and geography," Nature, Nature, vol. 486(7402), pages 222-227, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Koeneman, Scott H. & Cavanaugh, Joseph E., 2022. "An improved asymptotic test for the Jaccard similarity index for binary data," Statistics & Probability Letters, Elsevier, vol. 184(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sean M Gibbons & Sean M Kearney & Chris S Smillie & Eric J Alm, 2017. "Two dynamic regimes in the human gut microbiome," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-20, February.
    2. Charles K Fisher & Thierry Mora & Aleksandra M Walczak, 2017. "Variable habitat conditions drive species covariation in the human microbiota," PLOS Computational Biology, Public Library of Science, vol. 13(4), pages 1-18, April.
    3. Charles K Fisher & Pankaj Mehta, 2014. "Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries Using Sparse Linear Regression," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-10, July.
    4. Li, Jie & Shen, Xuzhu & Li, YaoTang, 2021. "Modeling the temporal dynamics of gut microbiota from a local community perspective," Ecological Modelling, Elsevier, vol. 460(C).
    5. Gregor Gorkiewicz & Gerhard G Thallinger & Slave Trajanoski & Stefan Lackner & Gernot Stocker & Thomas Hinterleitner & Christian Gülly & Christoph Högenauer, 2013. "Alterations in the Colonic Microbiota in Response to Osmotic Diarrhea," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-17, February.
    6. Lena Takayasu & Wataru Suda & Eiichiro Watanabe & Shinji Fukuda & Kageyasu Takanashi & Hiroshi Ohno & Misako Takayasu & Hideki Takayasu & Masahira Hattori, 2017. "A 3-dimensional mathematical model of microbial proliferation that generates the characteristic cumulative relative abundance distributions in gut microbiomes," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-20, August.
    7. Rajita Menon & Vivek Ramanan & Kirill S Korolev, 2018. "Interactions between species introduce spurious associations in microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(1), pages 1-20, January.
    8. Nowak, Piotr Bolesław, 2016. "The MLE of the mean of the exponential distribution based on grouped data is stochastically increasing," Statistics & Probability Letters, Elsevier, vol. 111(C), pages 49-54.
    9. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    10. Camilo Alberto Cárdenas-Hurtado & Aaron Levi Garavito-Acosta & Jorge Hernán Toro-Córdoba, 2018. "Asymmetric Effects of Terms of Trade Shocks on Tradable and Non-tradable Investment Rates: The Colombian Case," Borradores de Economia 1043, Banco de la Republica de Colombia.
    11. Anastasiou, Andreas, 2017. "Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 171-181.
    12. John Molloy & Katrina Allen & Fiona Collier & Mimi L. K. Tang & Alister C. Ward & Peter Vuillermin, 2013. "The Potential Link between Gut Microbiota and IgE-Mediated Food Allergy in Early Life," IJERPH, MDPI, vol. 10(12), pages 1-22, December.
    13. Evelina Di Corso & Tania Cerquitelli & Daniele Apiletti, 2018. "METATECH: METeorological Data Analysis for Thermal Energy CHaracterization by Means of Self-Learning Transparent Models," Energies, MDPI, vol. 11(6), pages 1-24, May.
    14. Silva, Ivair R., 2017. "Confidence intervals through sequential Monte Carlo," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 112-124.
    15. Shilan Li & Jianxin Shi & Paul Albert & Hong-Bin Fang, 2022. "Dependence Structure Analysis and Its Application in Human Microbiome," Mathematics, MDPI, vol. 11(1), pages 1-14, December.
    16. Denter, Philipp & Sisak, Dana, 2015. "Do polls create momentum in political competition?," Journal of Public Economics, Elsevier, vol. 130(C), pages 1-14.
    17. Salgado Alfredo, 2018. "Incomplete Information and Costly Signaling in College Admissions," Working Papers 2018-23, Banco de México.
    18. Albrecht, James & Anderson, Axel & Vroman, Susan, 2010. "Search by committee," Journal of Economic Theory, Elsevier, vol. 145(4), pages 1386-1407, July.
    19. Stegeman, Alwin, 2016. "A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 189-203.
    20. Mauricio Romero & Ã lvaro Riascos & Diego Jara, 2015. "On the Optimality of Answer-Copying Indices," Journal of Educational and Behavioral Statistics, , vol. 40(5), pages 435-453, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0187132. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.