IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0030126.html
   My bibliography  Save this article

Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

Author

Listed:
  • Ian Holmes
  • Keith Harris
  • Christopher Quince

Abstract

We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.

Suggested Citation

  • Ian Holmes & Keith Harris & Christopher Quince, 2012. "Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-15, February.
  • Handle: RePEc:plo:pone00:0030126
    DOI: 10.1371/journal.pone.0030126
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030126
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0030126&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0030126?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhao, Xin & Zhang, Jingru & Lin, Wei, 2023. "Clustering multivariate count data via Dirichlet-multinomial network fusion," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    2. Binhuan Wang & Lanqiu Yao & Jiyuan Hu & Huilin Li, 2023. "A New Algorithm for Convex Biclustering and Its Extension to the Compositional Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(1), pages 193-216, April.
    3. Achal Dhariwal & Polona Rajar & Gabriela Salvadori & Heidi Aarø Åmdal & Dag Berild & Ola Didrik Saugstad & Drude Fugelseth & Gorm Greisen & Ulf Dahle & Kirsti Haaland & Fernanda Cristina Petersen, 2024. "Prolonged hospitalization signature and early antibiotic effects on the nasopharyngeal resistome in preterm infants," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    4. Emma M. Koff & Debbie Baarle & Marlies A. Houten & Marta Reyman & Guy A. M. Berbers & Femke Ham & Mei Ling J. N. Chu & Elisabeth A. M. Sanders & Debby Bogaert & Susana Fuentes, 2022. "Mode of delivery modulates the intestinal microbiota and impacts the response to vaccination," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    5. Zhiwei Qin & John Bowman & Jagtej Bewli, 2018. "A Bayesian framework for large-scale geo-demand estimation in on-line retailing," Annals of Operations Research, Springer, vol. 263(1), pages 231-245, April.
    6. Laura Anderlucci & Cinzia Viroli, 2020. "Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 759-770, December.
    7. Kihyun Lee & Sebastien Raguideau & Kimmo Sirén & Francesco Asnicar & Fabio Cumbo & Falk Hildebrand & Nicola Segata & Chang-Jun Cha & Christopher Quince, 2023. "Population-level impacts of antibiotic usage on the human gut microbiome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    8. Yaru Song & Hongyu Zhao & Tao Wang, 2020. "An adaptive independence test for microbiome community data," Biometrics, The International Biometric Society, vol. 76(2), pages 414-426, June.
    9. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    10. Patrick LeBlanc & Li Ma, 2023. "Microbiome subcommunity learning with logistic‐tree normal latent Dirichlet allocation," Biometrics, The International Biometric Society, vol. 79(3), pages 2321-2332, September.
    11. Shaikh Mateen R. & Beyene Joseph, 2017. "Statistical models and computational algorithms for discovering relationships in microbiome data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(1), pages 1-12, March.
    12. Georgios D. Kitsios & Khaled Sayed & Adam Fitch & Haopu Yang & Noel Britton & Faraaz Shah & William Bain & John W. Evankovich & Shulin Qin & Xiaohong Wang & Kelvin Li & Asha Patel & Yingze Zhang & Jos, 2024. "Longitudinal multicompartment characterization of host-microbiota interactions in patients with acute respiratory failure," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    13. Mahbaneh Eshaghzadeh Torbati & Makedonka Mitreva & Vanathi Gopalakrishnan, 2016. "Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations," Data, MDPI, vol. 1(3), pages 1-14, December.
    14. Sean M Gibbons & Sean M Kearney & Chris S Smillie & Eric J Alm, 2017. "Two dynamic regimes in the human gut microbiome," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-20, February.
    15. Doris Vandeputte & Lindsey Commer & Raul Y. Tito & Gunter Kathagen & João Sabino & Séverine Vermeire & Karoline Faust & Jeroen Raes, 2021. "Temporal variability in quantitative human gut microbiome profiles and implications for clinical research," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    16. Julien Tap & Franck Lejzerowicz & Aurélie Cotillard & Matthieu Pichaud & Daniel McDonald & Se Jin Song & Rob Knight & Patrick Veiga & Muriel Derrien, 2023. "Global branches and local states of the human gut microbiome define associations with environmental and intrinsic factors," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    17. Mishra, Aditya & Müller, Christian L., 2022. "Robust regression with compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    18. Sanjeena Subedi & Drew Neish & Stephen Bak & Zeny Feng, 2020. "Cluster analysis of microbiome data by using mixtures of Dirichlet–multinomial regression models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(5), pages 1163-1187, November.
    19. Gallopin Mélina & Celeux Gilles & Jaffrézic Florence & Rau Andrea, 2015. "A model selection criterion for model-based clustering of annotated gene expression data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(5), pages 413-428, November.
    20. Tu, Wangshu & Browne, Ryan & Subedi, Sanjeena, 2024. "A mixture of logistic skew-normal multinomial models," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0030126. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.