IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i1p125-135.html
   My bibliography  Save this article

A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation

Author

Listed:
  • Arman Oganisian
  • Nandita Mitra
  • Jason A. Roy

Abstract

Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.

Suggested Citation

  • Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:125-135
    DOI: 10.1111/biom.13244
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13244
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13244?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Antonio R. Linero & Debajyoti Sinha & Stuart R. Lipsitz, 2020. "Semiparametric mixed‐scale models using shared Bayesian forests," Biometrics, The International Biometric Society, vol. 76(1), pages 131-144, March.
    2. Jason Roy & Kirsten J. Lum & Bret Zeldow & Jordan D. Dworkin & Vincent Lo Re & Michael J. Daniels, 2018. "Bayesian nonparametric generative models for causal inference with missing at random covariates," Biometrics, The International Biometric Society, vol. 74(4), pages 1193-1202, December.
    3. Chanmin Kim & Michael J. Daniels & Bess H. Marcus & Jason A. Roy, 2017. "A framework for Bayesian nonparametric inference for causal effects of mediation," Biometrics, The International Biometric Society, vol. 73(2), pages 401-409, June.
    4. Edward George & Purushottam Laud & Brent Logan & Robert McCulloch & Rodney Sparapani, 2019. "Fully Nonparametric Bayesian Additive Regression Trees," Advances in Econometrics, in: Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B, volume 40, pages 89-110, Emerald Group Publishing Limited.
    5. Yanxun Xu & Peter Müller & Abdus S. Wahed & Peter F. Thall, 2016. "Bayesian Nonparametric Estimation for Dynamic Treatment Regimes With Sequential Transition Times," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 921-950, July.
    6. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    7. Dandan Xu & Michael J. Daniels & Almut G. Winterstein, 2018. "A Bayesian nonparametric approach to causal inference on quantiles," Biometrics, The International Biometric Society, vol. 74(3), pages 986-996, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sunghae Jun, 2024. "Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining," Stats, MDPI, vol. 7(3), pages 1-15, August.
    2. Eoghan O'Neill, 2022. "Type I Tobit Bayesian Additive Regression Trees for Censored Outcome Regression," Papers 2211.07506, arXiv.org, revised Feb 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio R. Linero, 2023. "Prior and posterior checking of implicit causal assumptions," Biometrics, The International Biometric Society, vol. 79(4), pages 3153-3164, December.
    2. Antonio R. Linero, 2022. "Simulation‐based estimators of analytically intractable causal effects," Biometrics, The International Biometric Society, vol. 78(3), pages 1001-1017, September.
    3. Maria Josefsson & Michael J. Daniels, 2021. "Bayesian semi‐parametric G‐computation for causal inference in a cohort study with MNAR dropout and death," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(2), pages 398-414, March.
    4. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    5. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    6. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    7. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    8. Rufo, M.J. & Pérez, C.J. & Martín, J., 2009. "Local parametric sensitivity for mixture models of lifetime distributions," Reliability Engineering and System Safety, Elsevier, vol. 94(7), pages 1238-1244.
    9. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    10. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    11. Sphiwe B. Skhosana & Salomon M. Millard & Frans H. J. Kanfer, 2023. "A Novel EM-Type Algorithm to Estimate Semi-Parametric Mixtures of Partially Linear Models," Mathematics, MDPI, vol. 11(5), pages 1-20, February.
    12. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.
    13. Ungolo, Francesco & Kleinow, Torsten & Macdonald, Angus S., 2020. "A hierarchical model for the joint mortality analysis of pension scheme data with missing covariates," Insurance: Mathematics and Economics, Elsevier, vol. 91(C), pages 68-84.
    14. Ioannis Ntzoufras & Claudia Tarantola, 2012. "Conjugate and Conditional Conjugate Bayesian Analysis of Discrete Graphical Models of Marginal Independence," Quaderni di Dipartimento 178, University of Pavia, Department of Economics and Quantitative Methods.
    15. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    16. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.
    17. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    18. Simen Alexander Linge Johnsen & Jörg Bollmann, 2020. "Coccolith mass and morphology of different Emiliania huxleyi morphotypes: A critical examination using Canary Islands material," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-29, March.
    19. Nichole E. Carlson & Timothy D. Johnson & Morton B. Brown, 2009. "A Bayesian Approach to Modeling Associations Between Pulsatile Hormones," Biometrics, The International Biometric Society, vol. 65(2), pages 650-659, June.
    20. M. Rufo & J. Martín & C. Pérez, 2006. "Bayesian analysis of finite mixture models of distributions from exponential families," Computational Statistics, Springer, vol. 21(3), pages 621-637, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:125-135. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.