IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/yve6u.html
   My bibliography  Save this paper

Data-driven Covariate Selection for Confounding Adjustment by Focusing on the Stability of the Effect Estimator

Author

Listed:
  • Loh, Wen Wei
  • Ren, Dongning

Abstract

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inference following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the proposed method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets.

Suggested Citation

  • Loh, Wen Wei & Ren, Dongning, 2021. "Data-driven Covariate Selection for Confounding Adjustment by Focusing on the Stability of the Effect Estimator," OSF Preprints yve6u, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:yve6u
    DOI: 10.31219/osf.io/yve6u
    as

    Download full text from publisher

    File URL: https://osf.io/download/614d89c3687972000d8797d3/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/yve6u?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls," Papers 1201.0224, arXiv.org, revised May 2012.
    2. Brookhart, M. Alan & van der Laan, Mark J., 2006. "A semiparametric model selection criterion with applications to the marginal structural model," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 475-498, January.
    3. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    4. Ben B. Hansen, 2004. "Full Matching in an Observational Study of Coaching for the SAT," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 609-618, January.
    5. Daniel Vaughan-Whitehead, 2016. "Introduction," Economia & lavoro, Carocci editore, issue 2, pages 7-12.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Luke Keele & Dylan S. Small, 2021. "Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications," The American Statistician, Taylor & Francis Journals, vol. 75(4), pages 355-363, October.
    8. Po-Hsien Huang & Hung Chen & Li-Jen Weng, 2017. "A Penalized Likelihood Method for Structural Equation Modeling," Psychometrika, Springer;The Psychometric Society, vol. 82(2), pages 329-354, June.
    9. Glynn, Adam N. & Quinn, Kevin M., 2010. "An Introduction to the Augmented Inverse Propensity Weighted Estimator," Political Analysis, Cambridge University Press, vol. 18(1), pages 36-56, January.
    10. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2013. "The performance of estimators based on the propensity score," Journal of Econometrics, Elsevier, vol. 175(1), pages 1-21.
    2. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    3. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    4. Michael J. Weir & Thomas W. Sproul, 2019. "Identifying Drivers of Genetically Modified Seafood Demand: Evidence from a Choice Experiment," Sustainability, MDPI, vol. 11(14), pages 1-21, July.
    5. Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
    6. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Apr 2024.
    7. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    8. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    9. Bernard Koch & Tim Sainburg & Pablo Geraldo & Song Jiang & Yizhou Sun & Jacob Gates Foster, 2021. "A Primer on Deep Learning for Causal Inference," Papers 2110.04442, arXiv.org, revised Nov 2023.
    10. Helmut Wasserbacher & Martin Spindler, 2024. "Credit Ratings: Heterogeneous Effect on Capital Structure," Papers 2406.18936, arXiv.org.
    11. Pauline Leung & Zhuan Pei,, 2020. "Further Education During Unemployment," Working Papers 642, Princeton University, Department of Economics, Industrial Relations Section..
    12. Guido W. Imbens, 2015. "Matching Methods in Practice: Three Examples," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 373-419.
    13. Jiaming Mao & Jingzhi Xu, 2020. "Ensemble Learning with Statistical and Structural Models," Papers 2006.05308, arXiv.org.
    14. Kiran Tomlinson & Johan Ugander & Austin R. Benson, 2021. "Choice Set Confounding in Discrete Choice," Papers 2105.07959, arXiv.org, revised Aug 2021.
    15. Heigle, Julia & Pfeiffer, Friedhelm, 2019. "An analysis of selected labor market outcomes of college dropouts in Germany: A machine learning estimation approach. Research report," ZEW Expertises, ZEW - Leibniz Centre for European Economic Research, number 222378, June.
    16. Meyer, Birgit, 2020. "How deep is your love? Innovation, Upgrading and the Depth of Internationalization," VfS Annual Conference 2020 (Virtual Conference): Gender Economics 224584, Verein für Socialpolitik / German Economic Association.
    17. Graham, Bryan S. & Pinto, Cristine Campos de Xavier, 2022. "Semiparametrically efficient estimation of the average linear regression function," Journal of Econometrics, Elsevier, vol. 226(1), pages 115-138.
    18. Joseph Antonelli & Matthew Cefalu & Nathan Palmer & Denis Agniel, 2018. "Doubly robust matching estimators for high dimensional confounding adjustment," Biometrics, The International Biometric Society, vol. 74(4), pages 1171-1179, December.
    19. Jason Poulos & Shuxi Zeng, 2021. "RNN‐based counterfactual prediction, with an application to homestead policy and public schooling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 1124-1139, August.
    20. Shixiao Zhang & Peisong Han & Changbao Wu, 2023. "Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference," International Statistical Review, International Statistical Institute, vol. 91(2), pages 165-192, August.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:yve6u. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.