IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007895.html
   My bibliography  Save this article

A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type

Author

Listed:
  • Alan Le Goallec
  • Braden T Tierney
  • Jacob M Luber
  • Evan M Cofer
  • Aleksandar D Kostic
  • Chirag J Patel

Abstract

The microbiome is a new frontier for building predictors of human phenotypes. However, machine learning in the microbiome is fraught with issues of reproducibility, driven in large part by the wide range of analytic models and metagenomic data types available. We aimed to build robust metagenomic predictors of host phenotype by comparing prediction performances and biological interpretation across 8 machine learning methods and 4 different types of metagenomic data. Using 1,570 samples from 300 infants, we fit 7,865 models for 6 host phenotypes. We demonstrate the dependence of accuracy on algorithm choice and feature definition in microbiome data and propose a framework for building microbiome-derived indicators of host phenotype. We additionally identify biological features predictive of age, sex, breastfeeding status, historical antibiotic usage, country of origin, and delivery type. Our complete results can be viewed at http://apps.chiragjpgroup.org/ubiome_predictions/.Author summary: The human microbiome is hypothesized to influence human phenotype. However, many published host-microbe associations may not be reproducible. A number of reasons could be behind irreproducible results, including a wide array of methods for measuring the microbiome through genetic sequence, annotation pipelines, and analytical models/prediction approaches. Therefore, there is a need to compare different modeling strategies and microbiome data types (i.e. species abundance versus metabolic pathway abundance) to determine how to build robust and reproducible host-microbiome predictions. In this work, we executed a broad comparison of different predictive methods as a function of microbiome data types to effectively predict host characteristics. Our pipeline was able uncover robust microbial associations with phenotype. We additionally recommended considerations for reproducible microbiome-host association pipeline development. We claim our work is a necessary stepping stone in increasing the utility of emerging cohort data and enabling the next generation of efficient microbiome association studies in human health.

Suggested Citation

  • Alan Le Goallec & Braden T Tierney & Jacob M Luber & Evan M Cofer & Aleksandar D Kostic & Chirag J Patel, 2020. "A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type," PLOS Computational Biology, Public Library of Science, vol. 16(5), pages 1-21, May.
  • Handle: RePEc:plo:pcbi00:1007895
    DOI: 10.1371/journal.pcbi.1007895
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007895
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007895&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007895?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    2. Claire Duvallet & Sean M. Gibbons & Thomas Gurry & Rafael A. Irizarry & Eric J. Alm, 2017. "Meta-analysis of gut microbiome studies identifies disease-specific and shared responses," Nature Communications, Nature, vol. 8(1), pages 1-10, December.
    3. Sathish Subramanian & Sayeeda Huq & Tanya Yatsunenko & Rashidul Haque & Mustafa Mahfuz & Mohammed A. Alam & Amber Benezra & Joseph DeStefano & Martin F. Meier & Brian D. Muegge & Michael J. Barratt & , 2014. "Persistent gut microbiota immaturity in malnourished Bangladeshi children," Nature, Nature, vol. 510(7505), pages 417-421, June.
    4. David Zeevi & Tal Korem & Anastasia Godneva & Noam Bar & Alexander Kurilshikov & Maya Lotan-Pompan & Adina Weinberger & Jingyuan Fu & Cisca Wijmenga & Alexandra Zhernakova & Eran Segal, 2019. "Structural variation in the gut microbiome associates with host health," Nature, Nature, vol. 568(7750), pages 43-48, April.
    5. Nhan T. Ho & Fan Li & Kathleen A. Lee-Sarwar & Hein M. Tun & Bryan P. Brown & Pia S. Pannaraj & Jeffrey M. Bender & Meghan B. Azad & Amanda L. Thompson & Scott T. Weiss & M. Andrea Azcarate-Peril & Au, 2018. "Meta-analysis of effects of exclusive breastfeeding on infant gut microbiota across populations," Nature Communications, Nature, vol. 9(1), pages 1-13, December.
    6. Edoardo Pasolli & Duy Tin Truong & Faizan Malik & Levi Waldron & Nicola Segata, 2016. "Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-26, July.
    7. John Guittar & Ashley Shade & Elena Litchman, 2019. "Trait-based community assembly and succession of the infant gut microbiome," Nature Communications, Nature, vol. 10(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Braden T Tierney & Yingxuan Tan & Zhen Yang & Bing Shui & Michaela J Walker & Benjamin M Kent & Aleksandar D Kostic & Chirag J Patel, 2022. "Systematically assessing microbiome–disease associations identifies drivers of inconsistency in metagenomic research," PLOS Biology, Public Library of Science, vol. 20(3), pages 1-18, March.
    2. Alan Le Goallec & Samuel Diai & Sasha Collin & Jean-Baptiste Prost & Théo Vincent & Chirag J. Patel, 2022. "Using deep learning to predict abdominal age from liver and pancreas magnetic resonance images," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sean M Gibbons & Claire Duvallet & Eric J Alm, 2018. "Correcting for batch effects in case-control microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-17, April.
    2. Qi Su & Qin Liu & Raphaela Iris Lau & Jingwan Zhang & Zhilu Xu & Yun Kit Yeoh & Thomas W. H. Leung & Whitney Tang & Lin Zhang & Jessie Q. Y. Liang & Yuk Kam Yau & Jiaying Zheng & Chengyu Liu & Mengjin, 2022. "Faecal microbiome-based machine learning for multi-class disease diagnosis," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    3. Alexander Frankel & Maximilian Kasy, 2022. "Which Findings Should Be Published?," American Economic Journal: Microeconomics, American Economic Association, vol. 14(1), pages 1-38, February.
    4. Jyotirmoy Sarkar, 2018. "Will P†Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
    5. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    6. Stanley, T. D. & Doucouliagos, Chris, 2019. "Practical Significance, Meta-Analysis and the Credibility of Economics," IZA Discussion Papers 12458, Institute of Labor Economics (IZA).
    7. Karin Langenkamp & Bodo Rödel & Kerstin Taufenbach & Meike Weiland, 2018. "Open Access in Vocational Education and Training Research," Publications, MDPI, vol. 6(3), pages 1-12, July.
    8. Kevin J. Boyle & Mark Morrison & Darla Hatton MacDonald & Roderick Duncan & John Rose, 2016. "Investigating Internet and Mail Implementation of Stated-Preference Surveys While Controlling for Differences in Sample Frames," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 64(3), pages 401-419, July.
    9. Jelte M Wicherts & Marjan Bakker & Dylan Molenaar, 2011. "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-7, November.
    10. Valentine, Kathrene D & Buchanan, Erin Michelle & Scofield, John E. & Beauchamp, Marshall T., 2017. "Beyond p-values: Utilizing Multiple Estimates to Evaluate Evidence," OSF Preprints 9hp7y, Center for Open Science.
    11. Anton, Roman, 2014. "Sustainable Intrapreneurship - The GSI Concept and Strategy - Unfolding Competitive Advantage via Fair Entrepreneurship," MPRA Paper 69713, University Library of Munich, Germany, revised 01 Feb 2015.
    12. Dudek, Thomas & Brenøe, Anne Ardila & Feld, Jan & Rohrer, Julia, 2022. "No Evidence That Siblings' Gender Affects Personality across Nine Countries," IZA Discussion Papers 15137, Institute of Labor Economics (IZA).
    13. Uwe Hassler & Marc‐Oliver Pohle, 2022. "Unlucky Number 13? Manipulating Evidence Subject to Snooping," International Statistical Review, International Statistical Institute, vol. 90(2), pages 397-410, August.
    14. Frederique Bordignon, 2020. "Self-correction of science: a comparative study of negative citations and post-publication peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1225-1239, August.
    15. Omar Al-Ubaydli & John A. List, 2015. "Do Natural Field Experiments Afford Researchers More or Less Control than Laboratory Experiments? A Simple Model," NBER Working Papers 20877, National Bureau of Economic Research, Inc.
    16. Aurelie Seguin & Wolfgang Forstmeier, 2012. "No Band Color Effects on Male Courtship Rate or Body Mass in the Zebra Finch: Four Experiments and a Meta-Analysis," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-11, June.
    17. Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
    18. Dragana Radicic & Geoffrey Pugh & Hugo Hollanders & René Wintjes & Jon Fairburn, 2016. "The impact of innovation support programs on small and medium enterprises innovation in traditional manufacturing industries: An evaluation for seven European Union regions," Environment and Planning C, , vol. 34(8), pages 1425-1452, December.
    19. Colin F. Camerer & Anna Dreber & Felix Holzmeister & Teck-Hua Ho & Jürgen Huber & Magnus Johannesson & Michael Kirchler & Gideon Nave & Brian A. Nosek & Thomas Pfeiffer & Adam Altmejd & Nick Buttrick , 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Nature Human Behaviour, Nature, vol. 2(9), pages 637-644, September.
    20. Li, Lunzheng & Maniadis, Zacharias & Sedikides, Constantine, 2021. "Anchoring in Economics: A Meta-Analysis of Studies on Willingness-To-Pay and Willingness-To-Accept," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007895. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.