A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

My bibliography Save this article

A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

Author

Listed:

Youmi Suk
(Teachers College, Columbia University)

Registered:

Abstract

Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.

Suggested Citation

Youmi Suk, 2024. "A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies," Journal of Educational and Behavioral Statistics, , vol. 49(1), pages 61-91, February.

Handle: RePEc:sae:jedbes:v:49:y:2024:i:1:p:61-91
DOI: 10.3102/10769986231162096

Download full text from publisher

References listed on IDEAS

Youmi Suk & Jee-Seon Kim & Hyunseung Kang, 2021. "Hybridizing Machine Learning Methods and Finite Mixture Models for Estimating Heterogeneous Treatment Effects in Latent Classes," Journal of Educational and Behavioral Statistics, , vol. 46(3), pages 323-347, June.
Porter Kristin E. & Gruber Susan & van der Laan Mark J. & Sekhon Jasjeet S., 2011. "The Relative Performance of Targeted Maximum Likelihood Estimators," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-34, August.
Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
- Wager, Stefan & Athey, Susan, 2017. "Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests," Research Papers 3576, Stanford University, Graduate School of Business.
Youjin Lee & Trang Q. Nguyen & Elizabeth A. Stuart, 2021. "Partially pooled propensity score models for average treatment effect estimation with multilevel data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1578-1598, October.
Arpino, Bruno & Mealli, Fabrizia, 2011. "The specification of the propensity score in multilevel observational studies," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1770-1780, April.
- Bruno Arpino & Fabrizia Mealli, 2008. "The specification of the propensity score in multilevel observational studies," Working Papers 006, "Carlo F. Dondena" Centre for Research on Social Dynamics (DONDENA), Università Commerciale Luigi Bocconi.
- Arpino, Bruno & Mealli, Fabrizia, 2008. "The specification of the propensity score in multilevel observational studies," MPRA Paper 17407, University Library of Munich, Germany.
Dmitry Arkhangelsky & Guido Imbens, 2018. "The Role of the Propensity Score in Fixed Effect Models," NBER Working Papers 24814, National Bureau of Economic Research, Inc.
- Dmitry Arkhangelsky & Guido W. Imbens, 2019. "The Role of the Propensity Score in Fixed Effect Models," Working Papers wp2019_1905, CEMFI.
Jordan H. Rickles, 2013. "Examining Heterogeneity in the Effect of Taking Algebra in Eighth Grade," The Journal of Educational Research, Taylor & Francis Journals, vol. 106(4), pages 251-268, July.
Youmi Suk & Hyunseung Kang, 2022. "Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 310-343, March.
Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve, 2015. "Fitting Linear Mixed-Effects Models Using lme4," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i01).
Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
Gruber, Susan & Laan, Mark van der, 2012. "tmle: An R Package for Targeted Maximum Likelihood Estimation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i13).
Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, January.
Kosuke Imai & In Song Kim, 2019. "When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?," American Journal of Political Science, John Wiley & Sons, vol. 63(2), pages 467-490, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Youmi Suk & Hyunseung Kang, 2022. "Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 310-343, March.
Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
Weicong Lyu & Jee-Seon Kim & Youmi Suk, 2023. "Estimating Heterogeneous Treatment Effects Within Latent Class Multilevel Models: A Bayesian Approach," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 3-36, February.
Athey, Susan & Imbens, Guido W., 2022. "Design-based analysis in Difference-In-Differences settings with staggered adoption," Journal of Econometrics, Elsevier, vol. 226(1), pages 62-79.
- Susan Athey & Guido Imbens, 2018. "Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption," Papers 1808.05293, arXiv.org, revised Sep 2018.
- Athey, Susan & Imbens, Guido W., 2018. "Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption," Research Papers 3712, Stanford University, Graduate School of Business.
- Susan Athey & Guido W. Imbens, 2018. "Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption," NBER Working Papers 24963, National Bureau of Economic Research, Inc.
Dmitry Arkhangelsky & Guido W. Imbens, 2019. "Doubly Robust Identification for Causal Panel Data Models," Papers 1909.09412, arXiv.org, revised Feb 2022.
- Dmitry Arkhangelsky & Guido W. Imbens, 2021. "Double-Robust Identification for Causal Panel Data Models," NBER Working Papers 28364, National Bureau of Economic Research, Inc.
Youmi Suk & Jee-Seon Kim & Hyunseung Kang, 2021. "Hybridizing Machine Learning Methods and Finite Mixture Models for Estimating Heterogeneous Treatment Effects in Latent Classes," Journal of Educational and Behavioral Statistics, , vol. 46(3), pages 323-347, June.
Dimitris Bertsimas & Agni Orfanoudaki & Rory B. Weiner, 2020. "Personalized treatment for coronary artery disease patients: a machine learning approach," Health Care Management Science, Springer, vol. 23(4), pages 482-506, December.
Undral Byambadalai & Tatsushi Oka & Shota Yasui, 2024. "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction," Papers 2407.16037, arXiv.org.
Ta-Wei Huang & Eva Ascarza, 2024. "Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals," Marketing Science, INFORMS, vol. 43(4), pages 863-884, July.
Julius Owusu, 2023. "Randomization Inference of Heterogeneous Treatment Effects under Network Interference," Papers 2308.00202, arXiv.org, revised Jan 2025.
Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
Ronald Herrera & Ursula Berger & Ondine S. Von Ehrenstein & Iván Díaz & Stella Huber & Daniel Moraga Muñoz & Katja Radon, 2017. "Estimating the Causal Impact of Proximity to Gold and Copper Mines on Respiratory Diseases in Chilean Children: An Application of Targeted Maximum Likelihood Estimation," IJERPH, MDPI, vol. 15(1), pages 1-15, December.
Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
- Knaus, Michael C., 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Economics Working Paper Series 2004, University of St. Gallen, School of Economics and Political Science.
- Knaus, Michael C., 2020. "Double Machine Learning Based Program Evaluation under Unconfoundedness," IZA Discussion Papers 13051, Institute of Labor Economics (IZA).
- Michael C. Knaus, 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Papers 2003.03191, arXiv.org, revised Jun 2022.
Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
- Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03455978, HAL.
- Denis Fougère & Nicolas Jacquemet, 2021. "Policy Evaluation Using Causal Inference Methods," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03098058, HAL.
- Denis Fougère & Nicolas Jacquemet, 2021. "Policy Evaluation Using Causal Inference Methods," PSE-Ecole d'économie de Paris (Postprint) hal-03098058, HAL.
- Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," Working Papers hal-03455978, HAL.
- Denis Fougère & Nicolas Jacquemet, 2021. "Policy Evaluation Using Causal Inference Methods," Post-Print hal-03098058, HAL.
- Denis Fougère & Nicolas Jacquemet, 2021. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03098058, HAL.
- Fougère, Denis & Jacquemet, Nicolas, 2020. "Policy Evaluation Using Causal Inference Methods," IZA Discussion Papers 12922, Institute of Labor Economics (IZA).
Arthur Charpentier & Emmanuel Flachaire & Ewen Gallic, 2023. "Optimal Transport for Counterfactual Estimation: A Method for Causal Inference," Papers 2301.07755, arXiv.org.
- Arthur Charpentier & Emmanuel Flachaire & Ewen Gallic, 2024. "Optimal Transport for Counterfactual Estimation: A Method for Causal Inference," Post-Print hal-04678402, HAL.
Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
- Guido Imbens, 2019. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," NBER Working Papers 26104, National Bureau of Economic Research, Inc.
Susan Gruber & Mark J. van der Laan, 2013. "An Application of Targeted Maximum Likelihood Estimation to the Meta-Analysis of Safety Data," Biometrics, The International Biometric Society, vol. 69(1), pages 254-262, March.

More about this item

Keywords

causal inference; machine learning methods; unmeasured variables; omitted variable bias; cluster-level unmeasured confounders; fixed effects models; targeted maximum likelihood estimation;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:49:y:2024:i:1:p:61-91. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data