IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/124074.html
   My bibliography  Save this paper

Forward and backward state abstractions for off-policy evaluation

Author

Listed:
  • Hao, Meiling
  • Su, Pingfan
  • Hu, Liyuan
  • Szabo, Zoltan
  • Zhao, Qianyu
  • Shi, Chengchun

Abstract

Off-policy evaluation (OPE) is crucial for evaluating a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive sufficient conditions for achieving irrelevance in Q-functions and marginalized importance sampling ratios, the latter obtained by constructing a time-reversed Markov decision process (MDP) based on the observed MDP. (iii) We propose a novel two-step procedure that sequentially projects the original state space into a smaller space, which substantially simplify the sample complexity of OPE arising from high cardinality.

Suggested Citation

  • Hao, Meiling & Su, Pingfan & Hu, Liyuan & Szabo, Zoltan & Zhao, Qianyu & Shi, Chengchun, 2024. "Forward and backward state abstractions for off-policy evaluation," LSE Research Online Documents on Economics 124074, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:124074
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/124074/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Persson, Emma & Häggström, Jenny & Waernbaum, Ingeborg & de Luna, Xavier, 2017. "Data-driven algorithms for dimension reduction in causal inference," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 280-292.
    2. Baqun Zhang & Anastasios A. Tsiatis & Eric B. Laber & Marie Davidian, 2013. "Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions," Biometrika, Biometrika Trust, vol. 100(3), pages 681-694.
    3. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    4. Susan M. Shortreed & Ashkan Ertefaie, 2017. "Outcome‐adaptive lasso: Variable selection for causal inference," Biometrics, The International Biometric Society, vol. 73(4), pages 1111-1122, December.
    5. Peng Liao & Predrag Klasnja & Susan Murphy, 2021. "Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 382-391, March.
    6. Tyler J. VanderWeele & Ilya Shpitser, 2011. "A New Criterion for Confounder Selection," Biometrics, The International Biometric Society, vol. 67(4), pages 1406-1413, December.
    7. Xavier De Luna & Ingeborg Waernbaum & Thomas S. Richardson, 2011. "Covariate selection for the nonparametric estimation of an average treatment effect," Biometrika, Biometrika Trust, vol. 98(4), pages 861-875.
    8. Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    2. Uehleke, Reinhard & Petrick, Martin & Hüttel, Silke, 2022. "Evaluations of agri-environmental schemes based on observational farm data: The importance of covariate selection," Land Use Policy, Elsevier, vol. 114(C).
    3. Yongnam Kim, 2019. "The Causal Structure of Suppressor Variables," Journal of Educational and Behavioral Statistics, , vol. 44(4), pages 367-389, August.
    4. Thomas S. Richardson & James M. Robins & Linbo Wang, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 403-406, June.
    5. Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
    6. Tingting Zhou & Michael R. Elliott & Roderick J. A. Little, 2021. "Robust Causal Estimation from Observational Studies Using Penalized Spline of Propensity Score for Treatment Comparison," Stats, MDPI, vol. 4(2), pages 1-21, June.
    7. Jenny Häggström, 2018. "Data†driven confounder selection via Markov and Bayesian networks," Biometrics, The International Biometric Society, vol. 74(2), pages 389-398, June.
    8. Xu Qin & Jonah Deutsch & Guanglei Hong, 2021. "Unpacking Complex Mediation Mechanisms And Their Heterogeneity Between Sites In A Job Corps Evaluation," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 40(1), pages 158-190, January.
    9. Leonard Henckel & Emilija Perković & Marloes H. Maathuis, 2022. "Graphical criteria for efficient total effect estimation via adjustment in causal linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 579-599, April.
    10. David Cheng & Abhishek Chakrabortty & Ashwin N. Ananthakrishnan & Tianxi Cai, 2020. "Estimating average treatment effects with a double‐index propensity score," Biometrics, The International Biometric Society, vol. 76(3), pages 767-777, September.
    11. Antonelli Joseph & Cefalu Matthew, 2020. "Averaging causal estimators in high dimensions," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 92-107, January.
    12. Ertefaie Ashkan & Asgharian Masoud & Stephens David A., 2018. "Variable Selection in Causal Inference using a Simultaneous Penalization Method," Journal of Causal Inference, De Gruyter, vol. 6(1), pages 1-16, March.
    13. Joseph Antonelli & Matthew Cefalu & Nathan Palmer & Denis Agniel, 2018. "Doubly robust matching estimators for high dimensional confounding adjustment," Biometrics, The International Biometric Society, vol. 74(4), pages 1171-1179, December.
    14. Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
    15. Brandon Koch & David M. Vock & Julian Wolfson, 2018. "Covariate selection with group lasso and doubly robust estimation of causal effects," Biometrics, The International Biometric Society, vol. 74(1), pages 8-17, March.
    16. Li, Ting & Shi, Chengchun & Lu, Zhaohua & Li, Yi & Zhu, Hongtu, 2024. "Evaluating dynamic conditional quantile treatment effects with applications in ridesharing," LSE Research Online Documents on Economics 122488, London School of Economics and Political Science, LSE Library.
    17. Persson, Emma & Häggström, Jenny & Waernbaum, Ingeborg & de Luna, Xavier, 2017. "Data-driven algorithms for dimension reduction in causal inference," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 280-292.
    18. Xun Lu, 2015. "A Covariate Selection Criterion for Estimation of Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 506-522, October.
    19. Edward H. Kennedy & Sivaraman Balakrishnan, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Jenny Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 399-402, June.
    20. Davide Viviano & Jelena Bradic, 2021. "Dynamic covariate balancing: estimating treatment effects over time with potential local projections," Papers 2103.01280, arXiv.org, revised Jan 2024.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:124074. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.