IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i4p961-979.html
   My bibliography  Save this article

A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study

Author

Listed:
  • Kurtis Shuler
  • Samuel Verbanic
  • Irene A. Chen
  • Juhee Lee

Abstract

High‐throughput sequencing technology has enabled researchers to profile microbial communities from a variety of environments, but analysis of multivariate taxon count data remains challenging. We develop a Bayesian nonparametric (BNP) regression model with zero inflation to analyse multivariate count data from microbiome studies. A BNP approach flexibly models microbial associations with covariates, such as environmental factors and clinical characteristics. The model produces estimates for probability distributions which relate microbial diversity and differential abundance to covariates, and facilitates community comparisons beyond those provided by simple statistical tests. We compare the model to simpler models and popular alternatives in simulation studies, showing, in addition to these additional community‐level insights, it yields superior parameter estimates and model fit in various settings. The model's utility is demonstrated by applying it to a chronic wound microbiome data set and a Human Microbiome Project data set, where it is used to compare microbial communities present in different environments.

Suggested Citation

  • Kurtis Shuler & Samuel Verbanic & Irene A. Chen & Juhee Lee, 2021. "A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 961-979, August.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:961-979
    DOI: 10.1111/rssc.12493
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12493
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12493?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. De Iorio, Maria & Muller, Peter & Rosner, Gary L. & MacEachern, Steven N., 2004. "An ANOVA Model for Dependent Random Measures," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 205-215, January.
    2. Jialiang Mao & Yuhan Chen & Li Ma, 2020. "Bayesian Graphical Compositional Regression for Microbiome Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 610-624, April.
    3. Russell B. Millar, 2009. "Comparison of Hierarchical Bayesian Models for Overdispersed Count Data using DIC and Bayes' Factors," Biometrics, The International Biometric Society, vol. 65(3), pages 962-969, September.
    4. Luis E. Nieto-Barajas & Peter Müller & Yuan Ji & Yiling Lu & Gordon B. Mills, 2012. "A Time-Series DDP for Functional Proteomics Profiles," Biometrics, The International Biometric Society, vol. 68(3), pages 859-868, September.
    5. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    6. Maria De Iorio & Wesley O. Johnson & Peter Müller & Gary L. Rosner, 2009. "Bayesian Nonparametric Nonproportional Hazards Survival Modeling," Biometrics, The International Biometric Society, vol. 65(3), pages 762-771, September.
    7. Gelfand, Alan E. & Kottas, Athanasios & MacEachern, Steven N., 2005. "Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1021-1035, September.
    8. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    9. Griffin, J.E. & Steel, M.F.J., 2011. "Stick-breaking autoregressive processes," Journal of Econometrics, Elsevier, vol. 162(2), pages 383-396, June.
    10. Jason A. Duan & Michele Guindani & Alan E. Gelfand, 2007. "Generalized Spatial Dirichlet Process Models," Biometrika, Biometrika Trust, vol. 94(4), pages 809-825.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Kunzhi & Shen, Weining & Zhu, Weixuan, 2023. "Covariate dependent Beta-GOS process," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Bhattacharya, Indrabati & Ghosal, Subhashis, 2021. "Bayesian multivariate quantile regression using Dependent Dirichlet Process prior," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    3. Stefano Favaro & Antonio Lijoi & Igor Prünster, 2012. "On the stick–breaking representation of normalized inverse Gaussian priors," DEM Working Papers Series 008, University of Pavia, Department of Economics and Management.
    4. Igor Prünster & Matteo Ruggiero, 2011. "A Bayesian nonparametric approach to modeling market share dynamics," Carlo Alberto Notebooks 217, Collegio Carlo Alberto.
    5. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2011. "Beta-product Poisson-Dirichlet Processes," DES - Working Papers. Statistics and Econometrics. WS 12160, Universidad Carlos III de Madrid. Departamento de Estadística.
    6. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2014. "Beta-product dependent Pitman–Yor processes for Bayesian inference," Journal of Econometrics, Elsevier, vol. 180(1), pages 49-72.
    7. repec:jss:jstsof:40:i05 is not listed on IDEAS
    8. Kassandra Fronczyk & Athanasios Kottas, 2017. "Risk Assessment for Toxicity Experiments with Discrete and Continuous Outcomes: A Bayesian Nonparametric Approach," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 22(4), pages 585-601, December.
    9. Zahra Barzegar & Firoozeh Rivaz, 2020. "A scalable Bayesian nonparametric model for large spatio-temporal data," Computational Statistics, Springer, vol. 35(1), pages 153-173, March.
    10. Joshua C.C. Chan & Angelia L. Grant, 2014. "Issues in Comparing Stochastic Volatility Models Using the Deviance Information Criterion," CAMA Working Papers 2014-51, Centre for Applied Macroeconomic Analysis, Crawford School of Public Policy, The Australian National University.
    11. Fernanda B. Rizzato & Roseli A. Leandro & Clarice G.B. Demétrio & Geert Molenberghs, 2016. "A Bayesian approach to analyse overdispersed longitudinal count data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(11), pages 2085-2109, August.
    12. Andrés F. Barrientos & Alejandro Jara & Fernando A. Quintana, 2017. "Fully Nonparametric Regression for Bounded Data Using Dependent Bernstein Polynomials," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 806-825, April.
    13. Pati, Debdeep & Dunson, David B. & Tokdar, Surya T., 2013. "Posterior consistency in conditional distribution estimation," Journal of Multivariate Analysis, Elsevier, vol. 116(C), pages 456-472.
    14. Tchumtchoua, Sylvie & Dey, Dipak, 2007. "Semiparametric Bayesian Estimation of Random Coefficients Discrete Choice Models," Research Reports 149208, University of Connecticut, Food Marketing Policy Center.
    15. Bissiri, Pier Giovanni & Cleanthous, Galatia & Emery, Xavier & Nipoti, Bernardo & Porcu, Emilio, 2022. "Nonparametric Bayesian modelling of longitudinally integrated covariance functions on spheres," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    16. Edgar C. Merkle & Daniel Furr & Sophia Rabe-Hesketh, 2019. "Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods," Psychometrika, Springer;The Psychometric Society, vol. 84(3), pages 802-829, September.
    17. Abel Rodriguez & Enrique ter Horst, 2008. "Measuring expectations in options markets: An application to the SP500 index," Papers 0901.0033, arXiv.org.
    18. Congdon, P., 2007. "Bayesian modelling strategies for spatially varying regression coefficients: A multivariate perspective for multiple outcomes," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2586-2601, February.
    19. Gutiérrez, Luis & Mena, Ramsés H. & Ruggiero, Matteo, 2016. "A time dependent Bayesian nonparametric model for air quality analysis," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 161-175.
    20. Abel Rodr�guez & Enrique ter Horst, 2011. "Measuring expectations in options markets: an application to the S&P500 index," Quantitative Finance, Taylor & Francis Journals, vol. 11(9), pages 1393-1405, July.
    21. Richardson, Robert & Hartman, Brian, 2018. "Bayesian nonparametric regression models for modeling and predicting healthcare claims," Insurance: Mathematics and Economics, Elsevier, vol. 83(C), pages 1-8.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:961-979. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.