IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i2p413-423.html
   My bibliography  Save this article

Robust and efficient semi‐supervised estimation of average treatment effects with application to electronic health records data

Author

Listed:
  • David Cheng
  • Ashwin N. Ananthakrishnan
  • Tianxi Cai

Abstract

We consider the problem of estimating the average treatment effect (ATE) in a semi‐supervised learning setting, where a very small proportion of the entire set of observations are labeled with the true outcome but features predictive of the outcome are available among all observations. This problem arises, for example, when estimating treatment effects in electronic health records (EHR) data because gold‐standard outcomes are often not directly observable from the records but are observed for a limited number of patients through small‐scale manual chart review. We develop an imputation‐based approach for estimating the ATE that is robust to misspecification of the imputation model. This effectively allows information from the predictive features to be safely leveraged to improve efficiency in estimating the ATE. The estimator is additionally doubly‐robust in that it is consistent under correct specification of either an initial propensity score model or a baseline outcome model. It is also locally semiparametric efficient under an ideal semi‐supervised model where the distribution of the unlabeled data is known. Simulations exhibit the efficiency and robustness of the proposed method compared to existing approaches in finite samples. We illustrate the method by comparing rates of treatment response to two biologic agents for treatment inflammatory bowel disease using EHR data from Partners' Healthcare.

Suggested Citation

  • David Cheng & Ashwin N. Ananthakrishnan & Tianxi Cai, 2021. "Robust and efficient semi‐supervised estimation of average treatment effects with application to electronic health records data," Biometrics, The International Biometric Society, vol. 77(2), pages 413-423, June.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:2:p:413-423
    DOI: 10.1111/biom.13298
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13298
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13298?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. D Benkeser & M Carone & M J Van Der Laan & P B Gilbert, 2017. "Doubly robust nonparametric inference on the average treatment effect," Biometrika, Biometrika Trust, vol. 104(4), pages 863-880.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    4. Chen S.X. & Leung D.H.Y. & Qin J., 2003. "Information Recovery in a Study With Surrogate Endpoints," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 1052-1062, January.
    5. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    6. Andrea Rotnitzky & Quanhong Lei & Mariela Sued & James M. Robins, 2012. "Improved double-robust estimation in missing data and causal inference models," Biometrika, Biometrika Trust, vol. 99(2), pages 439-456.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.
    3. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    4. G. Aneiros & P. Vieu, 2016. "Sparse nonparametric model for regression with functional covariate," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(4), pages 839-859, October.
    5. Lam, Clifford, 2008. "Estimation of large precision matrices through block penalization," LSE Research Online Documents on Economics 31543, London School of Economics and Political Science, LSE Library.
    6. Zhang, Tao & Zhang, Qingzhao & Wang, Qihua, 2014. "Model detection for functional polynomial regression," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 183-197.
    7. Toshio Honda, 2021. "The de-biased group Lasso estimation for varying coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(1), pages 3-29, February.
    8. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    9. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    10. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    11. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    12. Zambom, Adriano Zanin & Akritas, Michael G., 2015. "Nonparametric significance testing and group variable selection," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 51-60.
    13. repec:kan:wpaper:202105 is not listed on IDEAS
    14. Chen, Bin & Maung, Kenwin, 2023. "Time-varying forecast combination for high-dimensional data," Journal of Econometrics, Elsevier, vol. 237(2).
    15. Antonelli Joseph & Cefalu Matthew, 2020. "Averaging causal estimators in high dimensions," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 92-107, January.
    16. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    17. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    18. Qu, Lianqiang & Song, Xinyuan & Sun, Liuquan, 2018. "Identification of local sparsity and variable selection for varying coefficient additive hazards models," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 119-135.
    19. Iván Díaz & Elizabeth Colantuoni & Daniel F. Hanley & Michael Rosenblum, 2019. "Improved precision in the analysis of randomized trials with survival outcomes, without assuming proportional hazards," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 439-468, July.
    20. Huicong Yu & Jiaqi Wu & Weiping Zhang, 2024. "Simultaneous subgroup identification and variable selection for high dimensional data," Computational Statistics, Springer, vol. 39(6), pages 3181-3205, September.
    21. Abbas Khalili & Farhad Shokoohi & Masoud Asgharian & Shili Lin, 2023. "Sparse estimation in semiparametric finite mixture of varying coefficient regression models," Biometrics, The International Biometric Society, vol. 79(4), pages 3445-3457, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:2:p:413-423. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.