IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v105y2017icp280-292.html
   My bibliography  Save this article

Data-driven algorithms for dimension reduction in causal inference

Author

Listed:
  • Persson, Emma
  • Häggström, Jenny
  • Waernbaum, Ingeborg
  • de Luna, Xavier

Abstract

In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.

Suggested Citation

  • Persson, Emma & Häggström, Jenny & Waernbaum, Ingeborg & de Luna, Xavier, 2017. "Data-driven algorithms for dimension reduction in causal inference," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 280-292.
  • Handle: RePEc:eee:csdana:v:105:y:2017:i:c:p:280-292
    DOI: 10.1016/j.csda.2016.08.012
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947316302018
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.08.012?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peter Hall & Jeff Racine & Qi Li, 2004. "Cross-Validation and the Estimation of Conditional Probability Densities," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1015-1026, December.
    2. Jinyong Hahn, 2004. "Functional Restriction and Efficiency in Causal Inference," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 73-76, February.
    3. Peter Hall & Qi Li & Jeffrey S. Racine, 2007. "Nonparametric Estimation of Regression Functions in the Presence of Irrelevant Regressors," The Review of Economics and Statistics, MIT Press, vol. 89(4), pages 784-789, November.
    4. van der Laan Mark J. & Gruber Susan, 2010. "Collaborative Double Robust Targeted Maximum Likelihood Estimation," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-71, May.
    5. de Luna, Xavier & Waernbaum, Ingeborg, 2005. "Covariate selection for non-parametric estimation of treatment effects," Working Paper Series 2005:4, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    6. Tyler J. VanderWeele & Ilya Shpitser, 2011. "A New Criterion for Confounder Selection," Biometrics, The International Biometric Society, vol. 67(4), pages 1406-1413, December.
    7. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    8. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    9. Lexin Li & R. Dennis Cook & Christopher J. Nachtsheim, 2005. "Model‐free variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 285-299, April.
    10. Sophie Langenskiöld & Donald B. Rubin, 2008. "Outcome-free Design of Observational Studies: Peer Influence on Smoking," Annals of Economics and Statistics, GENES, issue 91-92, pages 107-125.
    11. repec:adr:anecst:y:2008:i:91-92:p:06 is not listed on IDEAS
    12. Cong Li & Desheng Ouyang & Jeffrey Racine, 2009. "Nonparametric regression with weakly dependent data: the discrete and continuous regressor case," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 21(6), pages 697-711.
    13. Xavier De Luna & Ingeborg Waernbaum & Thomas S. Richardson, 2011. "Covariate selection for the nonparametric estimation of an average treatment effect," Biometrika, Biometrika Trust, vol. 98(4), pages 861-875.
    14. Lexin Li & Xiangrong Yin, 2008. "Sliced Inverse Regression with Regularizations," Biometrics, The International Biometric Society, vol. 64(1), pages 124-131, March.
    15. Halbert White & Xun Lu, 2011. "Causal Diagrams for Treatment Effect Estimation with Application to Efficient Covariate Selection," The Review of Economics and Statistics, MIT Press, vol. 93(4), pages 1453-1459, November.
    16. Häggström, Jenny & Persson, Emma & Waernbaum, Ingeborg & de Luna, Xavier, 2015. "CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i01).
    17. Yongwu Shao & R. Dennis Cook & Sanford Weisberg, 2007. "Marginal tests with sliced average variance estimation," Biometrika, Biometrika Trust, vol. 94(2), pages 285-296.
    18. Li, Qi & Racine, Jeffrey S. & Wooldridge, Jeffrey M., 2009. "Efficient Estimation of Average Treatment Effects with Mixed Categorical and Continuous Data," Journal of Business & Economic Statistics, American Statistical Association, vol. 27(2), pages 206-223.
    19. Alberto Abadie & Guido W. Imbens, 2006. "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, Econometric Society, vol. 74(1), pages 235-267, January.
    20. Corwin Matthew Zigler & Francesca Dominici, 2014. "Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 95-107, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Uehleke, Reinhard & Petrick, Martin & Hüttel, Silke, 2022. "Evaluations of agri-environmental schemes based on observational farm data: The importance of covariate selection," Land Use Policy, Elsevier, vol. 114(C).
    2. Jenny Häggström, 2018. "Data†driven confounder selection via Markov and Bayesian networks," Biometrics, The International Biometric Society, vol. 74(2), pages 389-398, June.
    3. Wilson, Paul W., 2018. "Dimension reduction in nonparametric models of production," European Journal of Operational Research, Elsevier, vol. 267(1), pages 349-367.
    4. Hao, Meiling & Su, Pingfan & Hu, Liyuan & Szabo, Zoltan & Zhao, Qianyu & Shi, Chengchun, 2024. "Forward and backward state abstractions for off-policy evaluation," LSE Research Online Documents on Economics 124074, London School of Economics and Political Science, LSE Library.
    5. Bryan Keller, 2020. "Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 119-142, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xun Lu, 2015. "A Covariate Selection Criterion for Estimation of Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 506-522, October.
    2. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    3. Joseph Antonelli & Matthew Cefalu & Nathan Palmer & Denis Agniel, 2018. "Doubly robust matching estimators for high dimensional confounding adjustment," Biometrics, The International Biometric Society, vol. 74(4), pages 1171-1179, December.
    4. Häggström, Jenny & Persson, Emma & Waernbaum, Ingeborg & de Luna, Xavier, 2015. "CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i01).
    5. Matthew Cefalu & Francesca Dominici & Nils Arvold & Giovanni Parmigiani, 2017. "Model averaged double robust estimation," Biometrics, The International Biometric Society, vol. 73(2), pages 410-421, June.
    6. Pingel, Ronnie & Waernbaum, Ingeborg, 2015. "Correlation and efficiency of propensity score-based estimators for average causal effects," Working Paper Series 2015:3, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    7. Halbert White & Karim Chalak, 2013. "Identification and Identification Failure for Treatment Effects Using Structural Systems," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 273-317, November.
    8. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    9. Bryan Keller, 2020. "Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 119-142, April.
    10. Tingting Zhou & Michael R. Elliott & Roderick J. A. Little, 2021. "Robust Causal Estimation from Observational Studies Using Penalized Spline of Propensity Score for Treatment Comparison," Stats, MDPI, vol. 4(2), pages 1-21, June.
    11. David Cheng & Abhishek Chakrabortty & Ashwin N. Ananthakrishnan & Tianxi Cai, 2020. "Estimating average treatment effects with a double‐index propensity score," Biometrics, The International Biometric Society, vol. 76(3), pages 767-777, September.
    12. Dehejia Rajeev, 2015. "Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic," Journal of Globalization and Development, De Gruyter, vol. 6(1), pages 47-69, June.
    13. Dingke Tang & Dehan Kong & Wenliang Pan & Linbo Wang, 2023. "Ultra‐high dimensional variable selection for doubly robust causal inference," Biometrics, The International Biometric Society, vol. 79(2), pages 903-914, June.
    14. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    15. Edward H. Kennedy & Sivaraman Balakrishnan, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Jenny Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 399-402, June.
    16. repec:wyi:journl:002112 is not listed on IDEAS
    17. Thomas S. Richardson & James M. Robins & Linbo Wang, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 403-406, June.
    18. Simar, Leopold & Zelenyuk, Valentin, 2011. "To Smooth or Not to Smooth? The Case of Discrete Variables in Nonparametric Regressions," LIDAM Discussion Papers ISBA 2011042, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    19. Susan M. Shortreed & Ashkan Ertefaie, 2017. "Outcome‐adaptive lasso: Variable selection for causal inference," Biometrics, The International Biometric Society, vol. 73(4), pages 1111-1122, December.
    20. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2013. "The performance of estimators based on the propensity score," Journal of Econometrics, Elsevier, vol. 175(1), pages 1-21.
    21. Lee, Ying-Ying, 2018. "Efficient propensity score regression estimators of multivalued treatment effects for the treated," Journal of Econometrics, Elsevier, vol. 204(2), pages 207-222.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:105:y:2017:i:c:p:280-292. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.