IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2212.09844.html
   My bibliography  Save this paper

Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding

Author

Listed:
  • Ashesh Rambachan
  • Amanda Coston
  • Edward Kennedy

Abstract

Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given choices made by human decision makers. We propose a unified framework for the robust design and evaluation of predictive algorithms in selectively observed data. We impose general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups.

Suggested Citation

  • Ashesh Rambachan & Amanda Coston & Edward Kennedy, 2022. "Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding," Papers 2212.09844, arXiv.org, revised May 2024.
  • Handle: RePEc:arx:papers:2212.09844
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2212.09844
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tan, Zhiqiang, 2006. "A Distributional Approach for Causal Inference Using Propensity Scores," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1619-1637, December.
    2. Toru Kitagawa & Aleksey Tetenov, 2018. "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Econometrica, Econometric Society, vol. 86(2), pages 591-616, March.
    3. Hongxiang Qiu & Marco Carone & Ekaterina Sadikova & Maria Petukhova & Ronald C. Kessler & Alex Luedtke, 2021. "Optimal Individualized Decision Rules Using Instrumental Variable Methods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 174-191, March.
    4. Dario Sansone, 2019. "Beyond Early Warning Indicators: High School Dropout and Machine Learning," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 81(2), pages 456-485, April.
    5. Peter M. Aronow & Donald K. K. Lee, 2013. "Interval estimation of population means under unknown but bounded probabilities of sample selection," Biometrika, Biometrika Trust, vol. 100(1), pages 235-240.
    6. Xinkun Nie & Guido Imbens & Stefan Wager, 2021. "Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations," Papers 2112.04723, arXiv.org.
    7. Qingyuan Zhao & Dylan S. Small & Bhaswar B. Bhattacharya, 2019. "Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(4), pages 735-761, September.
    8. Hongxiang Qiu & Marco Carone & Ekaterina Sadikova & Maria Petukhova & Ronald C. Kessler & Alex Luedtke, 2021. "Rejoinder: Optimal Individualized Decision Rules Using Instrumental Variable Methods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 207-209, March.
    9. Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
    10. L W Miratrix & S Wager & J R Zubizarreta, 2018. "Shape-constrained partial identification of a population mean under unknown probabilities of sample selection," Biometrika, Biometrika Trust, vol. 105(1), pages 103-114.
    11. Jacob Dorn & Kevin Guo, 2021. "Sharp Sensitivity Analysis for Inverse Propensity Weighting via Quantile Balancing," Papers 2102.04543, arXiv.org, revised Aug 2023.
    12. Díaz Iván & van der Laan Mark J., 2013. "Sensitivity Analysis for Causal Inference under Unmeasured Confounding and Measurement Error Problems," The International Journal of Biostatistics, De Gruyter, vol. 9(2), pages 149-160, November.
    13. Hongming Pu & Bo Zhang, 2021. "Estimating optimal treatment rules with an instrumental variable: A partial identification learning approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 318-345, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jacob Dorn & Kevin Guo & Nathan Kallus, 2021. "Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding," Papers 2112.11449, arXiv.org, revised Jul 2022.
    2. Matthew J Tudball & Rachael A Hughes & Kate Tilling & Jack Bowden & Qingyuan Zhao, 2023. "Sample-constrained partial identification with application to selection bias," Biometrika, Biometrika Trust, vol. 110(2), pages 485-498.
    3. Lihua Lei & Roshni Sahoo & Stefan Wager, 2023. "Policy Learning under Biased Sample Selection," Papers 2304.11735, arXiv.org.
    4. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    5. Yu-Chang Chen & Haitian Xie, 2022. "Personalized Subsidy Rules," Papers 2202.13545, arXiv.org, revised Mar 2022.
    6. Vira Semenova, 2023. "Aggregated Intersection Bounds and Aggregated Minimax Values," Papers 2303.00982, arXiv.org, revised Jun 2024.
    7. Samuel Higbee, 2022. "Policy Learning with New Treatments," Papers 2210.04703, arXiv.org, revised Sep 2023.
    8. Nathan Kallus, 2022. "Treatment Effect Risk: Bounds and Inference," Papers 2201.05893, arXiv.org, revised Jul 2022.
    9. Zhou, Yunzhe & Qi, Zhengling & Shi, Chengchun & Li, Lexin, 2023. "Optimizing pessimism in dynamic treatment regimes: a Bayesian learning approach," LSE Research Online Documents on Economics 118233, London School of Economics and Political Science, LSE Library.
    10. Nathan Kallus & Angela Zhou, 2021. "Minimax-Optimal Policy Learning Under Unobserved Confounding," Management Science, INFORMS, vol. 67(5), pages 2870-2890, May.
    11. Cui, Yifan & Tchetgen Tchetgen, Eric, 2021. "On a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable," Statistics & Probability Letters, Elsevier, vol. 178(C).
    12. Evan T.R. Rosenman & Guillaume Basse & Art B. Owen & Mike Baiocchi, 2023. "Combining observational and experimental datasets using shrinkage estimators," Biometrics, The International Biometric Society, vol. 79(4), pages 2961-2973, December.
    13. Xinkun Nie & Guido Imbens & Stefan Wager, 2021. "Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations," Papers 2112.04723, arXiv.org.
    14. Christopher Adjaho & Timothy Christensen, 2022. "Externally Valid Policy Choice," Papers 2205.05561, arXiv.org, revised Jul 2023.
    15. Qiu Hongxiang & Carone Marco & Luedtke Alex, 2022. "Individualized treatment rules under stochastic treatment cost constraints," Journal of Causal Inference, De Gruyter, vol. 10(1), pages 480-493, January.
    16. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    17. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    18. Yi Zhang & Kosuke Imai, 2023. "Individualized Policy Evaluation and Learning under Clustered Network Interference," Papers 2311.02467, arXiv.org, revised Feb 2024.
    19. Chen, Le-Yu & Lee, Sokbae, 2018. "Best subset binary prediction," Journal of Econometrics, Elsevier, vol. 206(1), pages 39-56.
    20. Manski, Charles F., 2023. "Probabilistic prediction for binary treatment choice: With focus on personalized medicine," Journal of Econometrics, Elsevier, vol. 234(2), pages 647-663.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2212.09844. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.