IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2109.04154.html
   My bibliography  Save this paper

Variable Selection for Causal Inference via Outcome-Adaptive Random Forest

Author

Listed:
  • Daniel Jacob

Abstract

Estimating a causal effect from observational data can be biased if we do not control for self-selection. This selection is based on confounding variables that affect the treatment assignment and the outcome. Propensity score methods aim to correct for confounding. However, not all covariates are confounders. We propose the outcome-adaptive random forest (OARF) that only includes desirable variables for estimating the propensity score to decrease bias and variance. Our approach works in high-dimensional datasets and if the outcome and propensity score model are non-linear and potentially complicated. The OARF excludes covariates that are not associated with the outcome, even in the presence of a large number of spurious variables. Simulation results suggest that the OARF produces unbiased estimates, has a smaller variance and is superior in variable selection compared to other approaches. The results from two empirical examples, the effect of right heart catheterization on mortality and the effect of maternal smoking during pregnancy on birth weight, show comparable treatment effects to previous findings but tighter confidence intervals and more plausible selected variables.

Suggested Citation

  • Daniel Jacob, 2021. "Variable Selection for Causal Inference via Outcome-Adaptive Random Forest," Papers 2109.04154, arXiv.org.
  • Handle: RePEc:arx:papers:2109.04154
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2109.04154
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Douglas Almond & Kenneth Y. Chay & David S. Lee, 2005. "The Costs of Low Birth Weight," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(3), pages 1031-1083.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Bradley Efron, 2014. "Estimation and Accuracy After Model Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 991-1007, September.
    4. Fan Li & Kari Lock Morgan & Alan M. Zaslavsky, 2018. "Balancing Covariates via Propensity Score Weighting," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 390-400, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    2. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    3. Phillip Heiler & Michael C. Knaus, 2021. "Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments," Papers 2110.01427, arXiv.org, revised Aug 2023.
    4. Zongwu Cai & Ying Fang & Ming Lin & Yaqian Wu, 2024. "Estimating Counterfactual Distribution Functions via Optimal Distribution Balancing with Applications," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202415, University of Kansas, Department of Economics.
    5. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    6. Aur'elien Sallin, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Papers 2110.08807, arXiv.org, revised Feb 2022.
    7. Jason Poulos & Shuxi Zeng, 2021. "RNN‐based counterfactual prediction, with an application to homestead policy and public schooling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 1124-1139, August.
    8. Dongcheng Zhang & Kunpeng Zhang, 2020. "Weighting-Based Treatment Effect Estimation via Distribution Learning," Papers 2012.13805, arXiv.org, revised May 2023.
    9. Phillip Heiler, 2020. "Efficient Covariate Balancing for the Local Average Treatment Effect," Papers 2007.04346, arXiv.org.
    10. Jeremiah Jones & Ashkan Ertefaie & Susan M. Shortreed, 2023. "Rejoinder to “Reader reaction to ‘Outcome‐adaptive Lasso: Variable selection for causal inference’ by Shortreed and Ertefaie (2017)”," Biometrics, The International Biometric Society, vol. 79(1), pages 521-525, March.
    11. Riccardo Di Francesco, 2022. "Aggregation Trees," CEIS Research Paper 546, Tor Vergata University, CEIS, revised 20 Nov 2023.
    12. Merlin Stein, 2022. "When are large female-led firms more resilient against shocks? Learnings from Indian enterprises during COVID-19 with diff-in-diff and causal forests," CSAE Working Paper Series 2022-01, Centre for the Study of African Economies, University of Oxford.
    13. Guido Imbens & Yiqing Xu, 2024. "LaLonde (1986) after Nearly Four Decades: Lessons Learned," Papers 2406.00827, arXiv.org, revised Jun 2024.
    14. Heiler, Phillip & Kazak, Ekaterina, 2021. "Valid inference for treatment effect parameters under irregular identification and many extreme propensity scores," Journal of Econometrics, Elsevier, vol. 222(2), pages 1083-1108.
    15. Riccardo Di Francesco, 2024. "Aggregation Trees," Papers 2410.11408, arXiv.org.
    16. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    17. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    18. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    19. Luis Guillermo Becerra-Valbuena & Jorge A. Bonilla, 2021. "Climatic shocks, air quality, and health at birth in Bogotá," Working Papers halshs-03429482, HAL.
    20. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2109.04154. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.