IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0215502.html
   My bibliography  Save this article

Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition

Author

Listed:
  • Jaron Thompson
  • Renee Johansen
  • John Dunbar
  • Brian Munsky

Abstract

Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest.

Suggested Citation

  • Jaron Thompson & Renee Johansen & John Dunbar & Brian Munsky, 2019. "Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-16, July.
  • Handle: RePEc:plo:pone00:0215502
    DOI: 10.1371/journal.pone.0215502
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215502
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0215502&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0215502?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Edoardo Pasolli & Duy Tin Truong & Faizan Malik & Levi Waldron & Nicola Segata, 2016. "Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-26, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hung-Chih Chen & Yen-Wen Liu & Kuan-Cheng Chang & Yen-Wen Wu & Yi-Ming Chen & Yu-Kai Chao & Min-Yi You & David J. Lundy & Chen-Ju Lin & Marvin L. Hsieh & Yu-Che Cheng & Ray P. Prajnamitra & Po-Ju Lin , 2023. "Gut butyrate-producers confer post-infarction cardiac protection," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    2. Youwen Qin & Xin Tong & Wei-Jian Mei & Yanshuang Cheng & Yuanqiang Zou & Kai Han & Jiehai Yu & Zhuye Jie & Tao Zhang & Shida Zhu & Xin Jin & Jian Wang & Huanming Yang & Xun Xu & Huanzi Zhong & Liang X, 2024. "Consistent signatures in the human gut microbiome of old- and young-onset colorectal cancer," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    3. Francesca De Filippis & Lorella Paparo & Rita Nocerino & Giusy Della Gatta & Laura Carucci & Roberto Russo & Edoardo Pasolli & Danilo Ercolini & Roberto Berni Canani, 2021. "Specific gut microbiome signatures and the associated pro-inflamatory functions are linked to pediatric allergy and acquisition of immune tolerance," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    4. Sean M Gibbons & Claire Duvallet & Eric J Alm, 2018. "Correcting for batch effects in case-control microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-17, April.
    5. Alan Le Goallec & Braden T Tierney & Jacob M Luber & Evan M Cofer & Aleksandar D Kostic & Chirag J Patel, 2020. "A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type," PLOS Computational Biology, Public Library of Science, vol. 16(5), pages 1-21, May.
    6. Qi Su & Qin Liu & Raphaela Iris Lau & Jingwan Zhang & Zhilu Xu & Yun Kit Yeoh & Thomas W. H. Leung & Whitney Tang & Lin Zhang & Jessie Q. Y. Liang & Yuk Kam Yau & Jiaying Zheng & Chengyu Liu & Mengjin, 2022. "Faecal microbiome-based machine learning for multi-class disease diagnosis," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    7. Efrat Muller & Itamar Shiryan & Elhanan Borenstein, 2024. "Multi-omic integration of microbiome data for identifying disease-associated modules," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0215502. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.