IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i1p125-135.html
   My bibliography  Save this article

A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation

Author

Listed:
  • Arman Oganisian
  • Nandita Mitra
  • Jason A. Roy

Abstract

Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.

Suggested Citation

  • Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:125-135
    DOI: 10.1111/biom.13244
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13244
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13244?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Edward George & Purushottam Laud & Brent Logan & Robert McCulloch & Rodney Sparapani, 2019. "Fully Nonparametric Bayesian Additive Regression Trees," Advances in Econometrics, in: Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B, volume 40, pages 89-110, Emerald Group Publishing Limited.
    2. Yanxun Xu & Peter Müller & Abdus S. Wahed & Peter F. Thall, 2016. "Bayesian Nonparametric Estimation for Dynamic Treatment Regimes With Sequential Transition Times," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 921-950, July.
    3. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    4. Antonio R. Linero & Debajyoti Sinha & Stuart R. Lipsitz, 2020. "Semiparametric mixed‐scale models using shared Bayesian forests," Biometrics, The International Biometric Society, vol. 76(1), pages 131-144, March.
    5. Dandan Xu & Michael J. Daniels & Almut G. Winterstein, 2018. "A Bayesian nonparametric approach to causal inference on quantiles," Biometrics, The International Biometric Society, vol. 74(3), pages 986-996, September.
    6. Jason Roy & Kirsten J. Lum & Bret Zeldow & Jordan D. Dworkin & Vincent Lo Re & Michael J. Daniels, 2018. "Bayesian nonparametric generative models for causal inference with missing at random covariates," Biometrics, The International Biometric Society, vol. 74(4), pages 1193-1202, December.
    7. Chanmin Kim & Michael J. Daniels & Bess H. Marcus & Jason A. Roy, 2017. "A framework for Bayesian nonparametric inference for causal effects of mediation," Biometrics, The International Biometric Society, vol. 73(2), pages 401-409, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Eoghan O'Neill, 2022. "Type I Tobit Bayesian Additive Regression Trees for Censored Outcome Regression," Papers 2211.07506, arXiv.org, revised Feb 2024.
    2. Sunghae Jun, 2024. "Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining," Stats, MDPI, vol. 7(3), pages 1-15, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio R. Linero, 2023. "Prior and posterior checking of implicit causal assumptions," Biometrics, The International Biometric Society, vol. 79(4), pages 3153-3164, December.
    2. Antonio R. Linero, 2022. "Simulation‐based estimators of analytically intractable causal effects," Biometrics, The International Biometric Society, vol. 78(3), pages 1001-1017, September.
    3. Maria Josefsson & Michael J. Daniels, 2021. "Bayesian semi‐parametric G‐computation for causal inference in a cohort study with MNAR dropout and death," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(2), pages 398-414, March.
    4. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    5. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    6. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    7. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    8. Shotwell Matthew S & Slate Elizabeth H, 2010. "Bayesian Modeling of Footrace Finishing Times," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 6(3), pages 1-21, July.
    9. James D. Hamilton & Daniel F. Waggoner & Tao Zha, 2007. "Normalization in Econometrics," Econometric Reviews, Taylor & Francis Journals, vol. 26(2-4), pages 221-252.
    10. Panagiotis Papastamoulis & George Iliopoulos, 2013. "On the Convergence Rate of Random Permutation Sampler and ECR Algorithm in Missing Data Models," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 293-304, June.
    11. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    12. Rufo, M.J. & Pérez, C.J. & Martín, J., 2009. "Local parametric sensitivity for mixture models of lifetime distributions," Reliability Engineering and System Safety, Elsevier, vol. 94(7), pages 1238-1244.
    13. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    14. Grn, Bettina & Leisch, Friedrich, 2009. "Dealing with label switching in mixture models under genuine multimodality," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 851-861, May.
    15. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    16. Aßmann, Christian & Boysen-Hogrefe, Jens, 2011. "A Bayesian approach to model-based clustering for binary panel probit models," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 261-279, January.
    17. Diana Mindrila, 2023. "Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision," Mathematics, MDPI, vol. 11(12), pages 1-18, June.
    18. Sphiwe B. Skhosana & Salomon M. Millard & Frans H. J. Kanfer, 2023. "A Novel EM-Type Algorithm to Estimate Semi-Parametric Mixtures of Partially Linear Models," Mathematics, MDPI, vol. 11(5), pages 1-20, February.
    19. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.
    20. Ungolo, Francesco & Kleinow, Torsten & Macdonald, Angus S., 2020. "A hierarchical model for the joint mortality analysis of pension scheme data with missing covariates," Insurance: Mathematics and Economics, Elsevier, vol. 91(C), pages 68-84.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:1:p:125-135. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.