IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004186.html
   My bibliography  Save this article

Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting

Author

Listed:
  • Davide Albanese
  • Carlotta De Filippo
  • Duccio Cavalieri
  • Claudio Donati

Abstract

Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.Author Summary: In metagenomics, the composition of complex microbial communities is characterized using Next Generation Sequencing technologies. Thanks to the decreasing cost of sequencing, large amounts of data have been generated for environmental samples and for a variety of health-associated conditions. In parallel there has been a flourishing of statistical methods to analyze metagenomic datasets, concentrating mainly on the problem of assessing the existence of significant differences between microbial communities in different conditions. However, for a large number of therapeutic and diagnostic applications it would be essential to identify and rank the microbial taxa that are most relevant in these comparisons. Here we present PhyloRelief, a novel feature-ranking algorithm that fills this gap by integrating the phylogenetic relationships amongst the taxa into a statistical feature weighting procedure. Without relying on a precompiled taxonomy, PhyloRelief determines the lineages most relevant to the diversification of the samples guided by the data. As such, PhyloRelief can be applied both to cases in which sequences can be classified according to a known taxonomy, and to cases in which this is not feasible, a common occurrence in metagenomic data analysis given the increasing number of new and uncultivable taxa that are discovered using these technologies.

Suggested Citation

  • Davide Albanese & Carlotta De Filippo & Duccio Cavalieri & Claudio Donati, 2015. "Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting," PLOS Computational Biology, Public Library of Science, vol. 11(3), pages 1-18, March.
  • Handle: RePEc:plo:pcbi00:1004186
    DOI: 10.1371/journal.pcbi.1004186
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004186
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004186&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004186?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stephanie L. Schnorr & Marco Candela & Simone Rampelli & Manuela Centanni & Clarissa Consolandi & Giulia Basaglia & Silvia Turroni & Elena Biagi & Clelia Peano & Marco Severgnini & Jessica Fiori & Rob, 2014. "Gut microbiome of the Hadza hunter-gatherers," Nature Communications, Nature, vol. 5(1), pages 1-12, May.
    2. George M. Weinstock, 2012. "Genomic approaches to studying the human microbiota," Nature, Nature, vol. 489(7415), pages 250-256, September.
    3. Tanya Yatsunenko & Federico E. Rey & Mark J. Manary & Indi Trehan & Maria Gloria Dominguez-Bello & Monica Contreras & Magda Magris & Glida Hidalgo & Robert N. Baldassano & Andrey P. Anokhin & Andrew C, 2012. "Human gut microbiome viewed across age and geography," Nature, Nature, vol. 486(7402), pages 222-227, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fiona B. Tamburini & Dylan Maghini & Ovokeraye H. Oduaran & Ryan Brewster & Michaella R. Hulley & Venesa Sahibdeen & Shane A. Norris & Stephen Tollman & Kathleen Kahn & Ryan G. Wagner & Alisha N. Wade, 2022. "Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    2. Gertrude Ecklu-Mensah & Candice Choo-Kang & Maria Gjerstad Maseng & Sonya Donato & Pascal Bovet & Bharathi Viswanathan & Kweku Bedu-Addo & Jacob Plange-Rhule & Prince Oti Boateng & Terrence E. Forrest, 2023. "Gut microbiota and fecal short chain fatty acids differ with adiposity and country of origin: the METS-microbiome study," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    3. John Molloy & Katrina Allen & Fiona Collier & Mimi L. K. Tang & Alister C. Ward & Peter Vuillermin, 2013. "The Potential Link between Gut Microbiota and IgE-Mediated Food Allergy in Early Life," IJERPH, MDPI, vol. 10(12), pages 1-22, December.
    4. Allison G. White & George S. Watts & Zhenqiang Lu & Maria M. Meza-Montenegro & Eric A. Lutz & Philip Harber & Jefferey L. Burgess, 2014. "Environmental Arsenic Exposure and Microbiota in Induced Sputum," IJERPH, MDPI, vol. 11(2), pages 1-15, February.
    5. Fanette Fontaine & Sondra Turjeman & Karel Callens & Omry Koren, 2023. "The intersection of undernutrition, microbiome, and child development in the first years of life," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    6. Kerstin Thriene & Karin B. Michels, 2023. "Human Gut Microbiota Plasticity throughout the Life Course," IJERPH, MDPI, vol. 20(2), pages 1-14, January.
    7. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    8. Yaru Song & Hongyu Zhao & Tao Wang, 2020. "An adaptive independence test for microbiome community data," Biometrics, The International Biometric Society, vol. 76(2), pages 414-426, June.
    9. Tamar Ringel-Kulka & Jing Cheng & Yehuda Ringel & Jarkko Salojärvi & Ian Carroll & Airi Palva & Willem M de Vos & Reetta Satokari, 2013. "Intestinal Microbiota in Healthy U.S. Young Children and Adults—A High Throughput Microarray Analysis," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-10, May.
    10. Miranda Loh & Dimosthenis Sarigiannis & Alberto Gotti & Spyros Karakitsios & Anjoeka Pronk & Eelco Kuijpers & Isabella Annesi-Maesano & Nour Baiz & Joana Madureira & Eduardo Oliveira Fernandes & Micha, 2017. "How Sensors Might Help Define the External Exposome," IJERPH, MDPI, vol. 14(4), pages 1-14, April.
    11. Todd D. Terhune & Richard C. Deth, 2018. "Aluminum Adjuvant-Containing Vaccines in the Context of the Hygiene Hypothesis: A Risk Factor for Eosinophilia and Allergy in a Genetically Susceptible Subpopulation?," IJERPH, MDPI, vol. 15(5), pages 1-16, May.
    12. Brian D. Huang & Thomas M. Groseclose & Corey J. Wilson, 2022. "Transcriptional programming in a Bacteroides consortium," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    13. Lara S. Franco & Danielle F. Shanahan & Richard A. Fuller, 2017. "A Review of the Benefits of Nature Experiences: More Than Meets the Eye," IJERPH, MDPI, vol. 14(8), pages 1-29, August.
    14. Marta Reyman & Marlies A. Houten & Rebecca L. Watson & Mei Ling J. N. Chu & Kayleigh Arp & Wouter J. Waal & Irene Schiering & Frans B. Plötz & Rob J. L. Willems & Willem Schaik & Elisabeth A. M. Sande, 2022. "Effects of early-life antibiotics on the developing infant gut microbiome and resistome: a randomized trial," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    15. Kumar P Mainali & Sharon Bewick & Peter Thielen & Thomas Mehoke & Florian P Breitwieser & Shishir Paudel & Arjun Adhikari & Joshua Wolfe & Eric V Slud & David Karig & William F Fagan, 2017. "Statistical analysis of co-occurrence patterns in microbial presence-absence datasets," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-21, November.
    16. Muntsa Rocafort & David B. Gootenberg & Jesús M. Luévano & Jeffrey M. Paer & Matthew R. Hayward & Juliet T. Bramante & Musie S. Ghebremichael & Jiawu Xu & Zoe H. Rogers & Alexander R. Munoz & Samson O, 2024. "HIV-associated gut microbial alterations are dependent on host and geographic context," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Natsuko Tabata & Mai Tsukada & Kozue Kubo & Yuri Inoue & Reiko Miroku & Fumihiko Odashima & Koichiro Shiratori & Takashi Sekiya & Shintaro Sengoku & Hideaki Shiroyama & Hiromichi Kimura, 2022. "Living Lab for Citizens’ Wellness: A Case of Maintaining and Improving a Healthy Diet under the COVID-19 Pandemic," IJERPH, MDPI, vol. 19(3), pages 1-17, January.
    18. Aibo Gao & Junlei Su & Ruixin Liu & Shaoqian Zhao & Wen Li & Xiaoqiang Xu & Danjie Li & Juan Shi & Bin Gu & Juan Zhang & Qi Li & Xiaolin Wang & Yifei Zhang & Yu Xu & Jieli Lu & Guang Ning & Jie Hong &, 2021. "Sexual dimorphism in glucose metabolism is shaped by androgen-driven gut microbiome," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    19. Lena Takayasu & Wataru Suda & Eiichiro Watanabe & Shinji Fukuda & Kageyasu Takanashi & Hiroshi Ohno & Misako Takayasu & Hideki Takayasu & Masahira Hattori, 2017. "A 3-dimensional mathematical model of microbial proliferation that generates the characteristic cumulative relative abundance distributions in gut microbiomes," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-20, August.
    20. Claudia Sala & Enrico Giampieri & Silvia Vitali & Paolo Garagnani & Daniel Remondini & Armando Bazzani & Claudio Franceschi & Gastone C Castellani, 2020. "Gut microbiota ecology: Biodiversity estimated from hybrid neutral-niche model increases with health status and aging," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-23, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004186. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.