IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v49y2024i1p61-91.html
   My bibliography  Save this article

A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

Author

Listed:
  • Youmi Suk

    (Teachers College, Columbia University)

Abstract

Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.

Suggested Citation

  • Youmi Suk, 2024. "A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies," Journal of Educational and Behavioral Statistics, , vol. 49(1), pages 61-91, February.
  • Handle: RePEc:sae:jedbes:v:49:y:2024:i:1:p:61-91
    DOI: 10.3102/10769986231162096
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/10769986231162096
    Download Restriction: no

    File URL: https://libkey.io/10.3102/10769986231162096?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Youmi Suk & Jee-Seon Kim & Hyunseung Kang, 2021. "Hybridizing Machine Learning Methods and Finite Mixture Models for Estimating Heterogeneous Treatment Effects in Latent Classes," Journal of Educational and Behavioral Statistics, , vol. 46(3), pages 323-347, June.
    2. Porter Kristin E. & Gruber Susan & van der Laan Mark J. & Sekhon Jasjeet S., 2011. "The Relative Performance of Targeted Maximum Likelihood Estimators," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-34, August.
    3. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    4. Youjin Lee & Trang Q. Nguyen & Elizabeth A. Stuart, 2021. "Partially pooled propensity score models for average treatment effect estimation with multilevel data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1578-1598, October.
    5. Arpino, Bruno & Mealli, Fabrizia, 2011. "The specification of the propensity score in multilevel observational studies," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1770-1780, April.
    6. Dmitry Arkhangelsky & Guido Imbens, 2018. "The Role of the Propensity Score in Fixed Effect Models," NBER Working Papers 24814, National Bureau of Economic Research, Inc.
    7. Jordan H. Rickles, 2013. "Examining Heterogeneity in the Effect of Taking Algebra in Eighth Grade," The Journal of Educational Research, Taylor & Francis Journals, vol. 106(4), pages 251-268, July.
    8. Youmi Suk & Hyunseung Kang, 2022. "Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 310-343, March.
    9. Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve, 2015. "Fitting Linear Mixed-Effects Models Using lme4," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i01).
    10. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    11. Gruber, Susan & Laan, Mark van der, 2012. "tmle: An R Package for Targeted Maximum Likelihood Estimation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i13).
    12. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, January.
    13. Kosuke Imai & In Song Kim, 2019. "When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?," American Journal of Political Science, John Wiley & Sons, vol. 63(2), pages 467-490, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Youmi Suk & Hyunseung Kang, 2022. "Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 310-343, March.
    2. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    3. Weicong Lyu & Jee-Seon Kim & Youmi Suk, 2023. "Estimating Heterogeneous Treatment Effects Within Latent Class Multilevel Models: A Bayesian Approach," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 3-36, February.
    4. Athey, Susan & Imbens, Guido W., 2022. "Design-based analysis in Difference-In-Differences settings with staggered adoption," Journal of Econometrics, Elsevier, vol. 226(1), pages 62-79.
    5. Dmitry Arkhangelsky & Guido W. Imbens, 2019. "Doubly Robust Identification for Causal Panel Data Models," Papers 1909.09412, arXiv.org, revised Feb 2022.
    6. Youmi Suk & Jee-Seon Kim & Hyunseung Kang, 2021. "Hybridizing Machine Learning Methods and Finite Mixture Models for Estimating Heterogeneous Treatment Effects in Latent Classes," Journal of Educational and Behavioral Statistics, , vol. 46(3), pages 323-347, June.
    7. Dimitris Bertsimas & Agni Orfanoudaki & Rory B. Weiner, 2020. "Personalized treatment for coronary artery disease patients: a machine learning approach," Health Care Management Science, Springer, vol. 23(4), pages 482-506, December.
    8. Undral Byambadalai & Tatsushi Oka & Shota Yasui, 2024. "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction," Papers 2407.16037, arXiv.org.
    9. Julius Owusu, 2023. "Randomization Inference of Heterogeneous Treatment Effects under Network Interference," Papers 2308.00202, arXiv.org, revised Jan 2025.
    10. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    11. Ronald Herrera & Ursula Berger & Ondine S. Von Ehrenstein & Iván Díaz & Stella Huber & Daniel Moraga Muñoz & Katja Radon, 2017. "Estimating the Causal Impact of Proximity to Gold and Copper Mines on Respiratory Diseases in Chilean Children: An Application of Targeted Maximum Likelihood Estimation," IJERPH, MDPI, vol. 15(1), pages 1-15, December.
    12. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    13. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    14. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," Working Papers hal-03455978, HAL.
    15. Arthur Charpentier & Emmanuel Flachaire & Ewen Gallic, 2023. "Optimal Transport for Counterfactual Estimation: A Method for Causal Inference," Papers 2301.07755, arXiv.org.
    16. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    17. David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
    18. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    19. Susan Gruber & Mark J. van der Laan, 2013. "An Application of Targeted Maximum Likelihood Estimation to the Meta-Analysis of Safety Data," Biometrics, The International Biometric Society, vol. 69(1), pages 254-262, March.
    20. Youmi Suk & Kyung T. Han, 2024. "A Psychometric Framework for Evaluating Fairness in Algorithmic Decision Making: Differential Algorithmic Functioning," Journal of Educational and Behavioral Statistics, , vol. 49(2), pages 151-172, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:49:y:2024:i:1:p:61-91. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.