IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/118250.html
   My bibliography  Save this paper

Conformal off-policy prediction

Author

Listed:
  • Zhang, Yingying
  • Shi, Chengchun
  • Luo, Shikai

Abstract

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy’s return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.

Suggested Citation

  • Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:118250
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/118250/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    2. Baqun Zhang & Anastasios A. Tsiatis & Eric B. Laber & Marie Davidian, 2012. "A Robust Method for Estimating Optimal Treatment Regimes," Biometrics, The International Biometric Society, vol. 68(4), pages 1010-1018, December.
    3. Daniel J. Luckett & Eric B. Laber & Anna R. Kahkoska & David M. Maahs & Elizabeth Mayer-Davis & Michael R. Kosorok, 2020. "Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 692-706, April.
    4. Jing Lei & Max G’Sell & Alessandro Rinaldo & Ryan J. Tibshirani & Larry Wasserman, 2018. "Distribution-Free Predictive Inference for Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1094-1111, July.
    5. S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
    6. Lan Wang & Yu Zhou & Rui Song & Ben Sherwood, 2018. "Quantile-Optimal Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1243-1254, July.
    7. Jing Lei & Larry Wasserman, 2014. "Distribution-free prediction bands for non-parametric regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 71-96, January.
    8. Donald B. Rubin, 2005. "Causal Inference Using Potential Outcomes: Design, Modeling, Decisions," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 322-331, March.
    9. Peng Liao & Predrag Klasnja & Susan Murphy, 2021. "Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 382-391, March.
    10. Xiaohong Chen & Zhengling Qi, 2022. "On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation," Papers 2201.06169, arXiv.org, revised Jun 2022.
    11. Solari, Aldo & Djordjilović, Vera, 2022. "Multi split conformal prediction," Statistics & Probability Letters, Elsevier, vol. 184(C).
    12. Lihua Lei & Emmanuel J. Candès, 2021. "Conformal inference of counterfactuals and individual treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 911-938, November.
    13. Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
    14. Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
    2. Shi, Chengchun & Luo, Shikai & Le, Yuan & Zhu, Hongtu & Song, Rui, 2022. "Statistically efficient advantage learning for offline reinforcement learning in infinite horizons," LSE Research Online Documents on Economics 115598, London School of Economics and Political Science, LSE Library.
    3. Shi, Chengchun & Wan, Runzhe & Song, Ge & Luo, Shikai & Zhu, Hongtu & Song, Rui, 2023. "A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets," LSE Research Online Documents on Economics 117174, London School of Economics and Political Science, LSE Library.
    4. Zhen Li & Jie Chen & Eric Laber & Fang Liu & Richard Baumgartner, 2023. "Optimal Treatment Regimes: A Review and Empirical Comparison," International Statistical Review, International Statistical Institute, vol. 91(3), pages 427-463, December.
    5. Shuai Chen & Lu Tian & Tianxi Cai & Menggang Yu, 2017. "A general statistical framework for subgroup identification and comparative treatment scoring," Biometrics, The International Biometric Society, vol. 73(4), pages 1199-1209, December.
    6. Muxuan Liang & Menggang Yu, 2023. "Relative contrast estimation and inference for treatment recommendation," Biometrics, The International Biometric Society, vol. 79(4), pages 2920-2932, December.
    7. Giorgos Bakoyannis, 2023. "Estimating optimal individualized treatment rules with multistate processes," Biometrics, The International Biometric Society, vol. 79(4), pages 2830-2842, December.
    8. Cai, Hengrui & Shi, Chengchun & Song, Rui & Lu, Wenbin, 2023. "Jump interval-learning for individualized decision making with continuous treatments," LSE Research Online Documents on Economics 118231, London School of Economics and Political Science, LSE Library.
    9. Yunan Wu & Lan Wang, 2021. "Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes," Biometrics, The International Biometric Society, vol. 77(2), pages 465-476, June.
    10. Li, Ting & Shi, Chengchun & Lu, Zhaohua & Li, Yi & Zhu, Hongtu, 2024. "Evaluating dynamic conditional quantile treatment effects with applications in ridesharing," LSE Research Online Documents on Economics 122488, London School of Economics and Political Science, LSE Library.
    11. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    12. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    13. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    14. Xin Qiu & Donglin Zeng & Yuanjia Wang, 2018. "Estimation and evaluation of linear individualized treatment rules to guarantee performance," Biometrics, The International Biometric Society, vol. 74(2), pages 517-528, June.
    15. Ruoqing Zhu & Ying-Qi Zhao & Guanhua Chen & Shuangge Ma & Hongyu Zhao, 2017. "Greedy outcome weighted tree learning of optimal personalized treatment rules," Biometrics, The International Biometric Society, vol. 73(2), pages 391-400, June.
    16. Hao, Meiling & Su, Pingfan & Hu, Liyuan & Szabo, Zoltan & Zhao, Qianyu & Shi, Chengchun, 2024. "Forward and backward state abstractions for off-policy evaluation," LSE Research Online Documents on Economics 124074, London School of Economics and Political Science, LSE Library.
    17. Victor Chernozhukov & Kaspar Wuthrich & Yinchu Zhu, 2019. "Distributional conformal prediction," Papers 1909.07889, arXiv.org, revised Aug 2021.
    18. Leying Guan, 2023. "Localized conformal prediction: a generalized inference framework for conformal prediction," Biometrika, Biometrika Trust, vol. 110(1), pages 33-50.
    19. Weibin Mo & Yufeng Liu, 2022. "Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment‐free effect models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 440-472, April.
    20. Yizhe Xu & Tom H. Greene & Adam P. Bress & Brian C. Sauer & Brandon K. Bellows & Yue Zhang & William S. Weintraub & Andrew E. Moran & Jincheng Shen, 2022. "Estimating the optimal individualized treatment rule from a cost‐effectiveness perspective," Biometrics, The International Biometric Society, vol. 78(1), pages 337-351, March.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:118250. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.