IDEAS home Printed from https://ideas.repec.org/p/hal/wpaper/hal-04238425.html
   My bibliography  Save this paper

Fischer-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Author

Listed:
  • Victor Chernozhukov

    (Economics department - MIT - Massachusetts Institute of Technology)

  • Mert Demirer
  • Esther Duflo

    (Collège de France - Chaire pauvreté et politiques publiques - CdF (institution) - Collège de France)

  • Iván Fernández-Val

    (Department of Economics - Tilburg University [Netherlands])

Abstract

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.

Suggested Citation

  • Victor Chernozhukov & Mert Demirer & Esther Duflo & Iván Fernández-Val, 2023. "Fischer-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Working Papers hal-04238425, HAL.
  • Handle: RePEc:hal:wpaper:hal-04238425
    DOI: 10.48550/arXiv.1712.04802
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    4. Victor Chernozhukov & Iván Fernández‐Val & Ye Luo, 2018. "The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages," Econometrica, Econometric Society, vol. 86(6), pages 1911-1938, November.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    7. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression models," CeMMAP working papers CWP24/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Christian Hansen & Damian Kozbur & Sanjog Misra, 2016. "Targeted undersmoothing," ECON - Working Papers 282, Department of Economics - University of Zurich, revised Apr 2018.
    9. Alberto Abadie, 2005. "Semiparametric Difference-in-Differences Estimators," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 72(1), pages 1-19.
    10. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Hoong, Ruru, 2021. "Self control and smartphone use: An experimental study of soft commitment devices," European Economic Review, Elsevier, vol. 140(C).
    3. Patrick Rehill, 2024. "How do applied researchers use the Causal Forest? A methodological review of a method," Papers 2404.13356, arXiv.org.
    4. Christiansen, T. & Weeks, M., 2020. "Distributional Aspects of Microcredit Expansions," Cambridge Working Papers in Economics 20100, Faculty of Economics, University of Cambridge.
    5. Michael Vlassopoulos & Abu Siddique & Tabassum Rahman & Debayan Pakrashi & Asad Islam & Firoz Ahmed, 2024. "Improving Women's Mental Health during a Pandemic," American Economic Journal: Applied Economics, American Economic Association, vol. 16(2), pages 422-455, April.
    6. Emerick, Kyle & Kelley, Erin & De Janvry, Alain & Sadoulet, Elisabeth, 2019. "Endogenous Information Sharing and the Gains from Using Network Information to Maximize Technology Adoption," CEPR Discussion Papers 13507, C.E.P.R. Discussion Papers.
    7. Alejandro Sanchez-Becerra, 2023. "Robust inference for the treatment effect variance in experiments using machine learning," Papers 2306.03363, arXiv.org.
    8. Kayo Murakami & Hideki Shimada & Yoshiaki Ushifusa & Takanori Ida, 2022. "Heterogeneous Treatment Effects Of Nudge And Rebate: Causal Machine Learning In A Field Experiment On Electricity Conservation," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1779-1803, November.
    9. Tianyu Du & Ayush Kanodia & Herman Brunborg & Keyon Vafa & Susan Athey, 2024. "LABOR-LLM: Language-Based Occupational Representations with Large Language Models," Papers 2406.17972, arXiv.org.
    10. Marianne Bertrand & Bruno Crépon & Alicia Marguerie & Patrick Premand, 2021. "Do Workfare Programs Live Up to Their Promises? Experimental Evidence from Cote D’Ivoire," NBER Working Papers 28664, National Bureau of Economic Research, Inc.
    11. Anthony Strittmatter, 2018. "What Is the Value Added by Using Causal Machine Learning Methods in a Welfare Experiment Evaluation?," Papers 1812.06533, arXiv.org, revised Dec 2021.
    12. Chaisemartin, Clement de & Navarrete, Nicolas, 2019. "The direct and spillover effects of a mental health program for disruptive students," CAGE Online Working Paper Series 401, Competitive Advantage in the Global Economy (CAGE).
    13. Siddique, Abu & Islam, Asad & Mozumder, Tanvir Ahmed & Rahman, Tabassum & Shatil, Tanvir, 2022. "Forced Displacement, Mental Health, and Child Development: Evidence from the Rohingya Refugees," SocArXiv b4fc7, Center for Open Science.
    14. Pramod Kumar Sur, 2021. "Understanding Vaccine Hesitancy: Empirical Evidence from India," Papers 2103.02909, arXiv.org, revised Feb 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Victor Chernozhukov & Mert Demirer & Esther Duflo & Ivan Fernandez-Val, 2017. "Generic machine learning inference on heterogenous treatment effects in randomized experiments," CeMMAP working papers 61/17, Institute for Fiscal Studies.
    2. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-Dimensional Econometrics and Regularized GMM," Papers 1806.01888, arXiv.org, revised Jun 2018.
    3. Michael Zimmert & Michael Lechner, 2019. "Nonparametric estimation of causal heterogeneity under high-dimensional confounding," Papers 1908.08779, arXiv.org.
    4. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    5. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    6. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    7. Michael C. Knaus & Michael Lechner & Anthony Strittmatter, 2018. "Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence," Papers 1810.13237, arXiv.org, revised Dec 2018.
    8. Michael C. Knaus, 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Papers 2003.03191, arXiv.org, revised Jun 2022.
    9. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    10. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    11. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    12. Sookyo Jeong & Hongseok Namkoong, 2020. "Assessing External Validity Over Worst-case Subpopulations," Papers 2007.02411, arXiv.org, revised Feb 2022.
    13. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    14. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    15. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    16. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    17. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    18. Carneiro, Pedro & Lee, Sokbae & Wilhelm, Daniel, 2016. "Optimal Data Collection for Randomized Control Trials," IZA Discussion Papers 9908, Institute of Labor Economics (IZA).
    19. Yuya Sasaki & Takuya Ura & Yichong Zhang, 2022. "Unconditional quantile regression with high‐dimensional data," Quantitative Economics, Econometric Society, vol. 13(3), pages 955-978, July.
    20. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:wpaper:hal-04238425. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.