IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v111y2024i2p517-535..html
   My bibliography  Save this article

Selective machine learning of doubly robust functionals

Author

Listed:
  • Y Cui
  • E J Tchetgen Tchetgen

Abstract

SummaryWhile model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion that states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study that we illustrate in simulations and a data application.

Suggested Citation

  • Y Cui & E J Tchetgen Tchetgen, 2024. "Selective machine learning of doubly robust functionals," Biometrika, Biometrika Trust, vol. 111(2), pages 517-535.
  • Handle: RePEc:oup:biomet:v:111:y:2024:i:2:p:517-535.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asad055
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Weihua Cao & Anastasios A. Tsiatis & Marie Davidian, 2009. "Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data," Biometrika, Biometrika Trust, vol. 96(3), pages 723-734.
    4. Peisong Han & Lu Wang, 2013. "Estimation with missing data: beyond double robustness," Biometrika, Biometrika Trust, vol. 100(2), pages 417-430.
    5. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    6. Xiaogang Duan & Guosheng Yin, 2017. "Ensemble Approaches to Estimating the Population Mean with Missing Response," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(4), pages 899-917, December.
    7. Eric J. Tchetgen Tchetgen & James M. Robins & Andrea Rotnitzky, 2010. "On doubly robust estimation in a semiparametric odds ratio model," Biometrika, Biometrika Trust, vol. 97(1), pages 171-180.
    8. Newey, Whitney K, 1990. "Semiparametric Efficiency Bounds," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 5(2), pages 99-135, April-Jun.
    9. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    10. Karel Vermeulen & Stijn Vansteelandt, 2015. "Bias-Reduced Doubly Robust Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1024-1036, September.
    11. Zhiqiang Tan, 2010. "Bounded, efficient and doubly robust estimation with inverse weighting," Biometrika, Biometrika Trust, vol. 97(3), pages 661-682.
    12. Tan, Zhiqiang, 2006. "A Distributional Approach for Causal Inference Using Propensity Scores," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1619-1637, December.
    13. van der Laan Mark J. & Gruber Susan, 2010. "Collaborative Double Robust Targeted Maximum Likelihood Estimation," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-71, May.
    14. A Rotnitzky & E Smucler & J M Robins, 2021. "Characterization of parameters with a mixed bias property," Biometrika, Biometrika Trust, vol. 108(1), pages 231-238.
    15. Z Tan, 2020. "Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data," Biometrika, Biometrika Trust, vol. 107(1), pages 137-158.
    16. Peisong Han, 2014. "Multiply Robust Estimation in Regression Analysis With Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1159-1173, September.
    17. Andrea Rotnitzky & Quanhong Lei & Mariela Sued & James M. Robins, 2012. "Improved double-robust estimation in missing data and causal inference models," Biometrika, Biometrika Trust, vol. 99(2), pages 439-456.
    18. Sixia Chen & David Haziza, 2017. "Multiply robust imputation procedures for the treatment of item nonresponse in surveys," Biometrika, Biometrika Trust, vol. 104(2), pages 439-453.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shixiao Zhang & Peisong Han & Changbao Wu, 2023. "Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference," International Statistical Review, International Statistical Institute, vol. 91(2), pages 165-192, August.
    2. Wang, Qihua & Su, Miaomiao & Wang, Ruoyu, 2021. "A beyond multiple robust approach for missing response problem," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    3. Peisong Han & Linglong Kong & Jiwei Zhao & Xingcai Zhou, 2019. "A general framework for quantile estimation with incomplete data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 305-333, April.
    4. Xiaogang Duan & Guosheng Yin, 2017. "Ensemble Approaches to Estimating the Population Mean with Missing Response," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(4), pages 899-917, December.
    5. Karel Vermeulen & Stijn Vansteelandt, 2015. "Bias-Reduced Doubly Robust Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1024-1036, September.
    6. Chen, Sixia & Haziza, David, 2018. "Jackknife empirical likelihood method for multiply robust estimation with missing data," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 258-268.
    7. Han, Peisong & Song, Peter X.-K. & Wang, Lu, 2015. "Achieving semiparametric efficiency bound in longitudinal data analysis with dropouts," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 59-70.
    8. AmirEmad Ghassami & Andrew Ying & Ilya Shpitser & Eric Tchetgen Tchetgen, 2021. "Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference," Papers 2104.02929, arXiv.org, revised Mar 2022.
    9. Jianxuan Liu & Yanyuan Ma & Lan Wang, 2018. "An alternative robust estimator of average treatment effect in causal inference," Biometrics, The International Biometric Society, vol. 74(3), pages 910-923, September.
    10. Peisong Han, 2014. "Multiply Robust Estimation in Regression Analysis With Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1159-1173, September.
    11. Lan Wen & Miguel A. Hernán & James M. Robins, 2022. "Multiply robust estimators of causal effects for survival outcomes," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1304-1328, September.
    12. Iván Díaz & Elizabeth Colantuoni & Daniel F. Hanley & Michael Rosenblum, 2019. "Improved precision in the analysis of randomized trials with survival outcomes, without assuming proportional hazards," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 439-468, July.
    13. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Apr 2024.
    14. Słoczyński, Tymon & Wooldridge, Jeffrey M., 2018. "A General Double Robustness Result For Estimating Average Treatment Effects," Econometric Theory, Cambridge University Press, vol. 34(1), pages 112-133, February.
    15. Peisong Han, 2016. "Combining Inverse Probability Weighting and Multiple Imputation to Improve Robustness of Estimation," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 246-260, March.
    16. Lee, Myoung-jae & Lee, Sanghyeok, 2019. "Double robustness without weighting," Statistics & Probability Letters, Elsevier, vol. 146(C), pages 175-180.
    17. Ao Yuan & Anqi Yin & Ming T. Tan, 2021. "Enhanced Doubly Robust Procedure for Causal Inference," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 454-478, December.
    18. Matthew Cefalu & Francesca Dominici & Nils Arvold & Giovanni Parmigiani, 2017. "Model averaged double robust estimation," Biometrics, The International Biometric Society, vol. 73(2), pages 410-421, June.
    19. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    20. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:111:y:2024:i:2:p:517-535.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.