Statistically efficient advantage learning for offline reinforcement learning in infinite horizons
Author
Abstract
Suggested Citation
Download full text from publisher
References listed on IDEAS
- Guanhua Chen & Donglin Zeng & Michael R. Kosorok, 2016. "Personalized Dose Finding Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1509-1521, October.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018.
"Double/debiased machine learning for treatment and structural parameters,"
Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2017. "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Papers 23564, National Bureau of Economic Research, Inc.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers CWP28/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers 28/17, Institute for Fiscal Studies.
- Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2021.
"Deep Neural Networks for Estimation and Inference,"
Econometrica, Econometric Society, vol. 89(1), pages 181-213, January.
- Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2018. "Deep Neural Networks for Estimation and Inference," Papers 1809.09953, arXiv.org, revised Sep 2019.
- Chen, Xiaohong & Christensen, Timothy M., 2015.
"Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions,"
Journal of Econometrics, Elsevier, vol. 188(2), pages 447-465.
- Xiaohong Chen & Timothy M. Christensen, 2014. "Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions," CeMMAP working papers CWP46/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Xiaohong Chen & Timothy M. Christensen, 2014. "Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators under Weak Dependence and Weak Conditions," Cowles Foundation Discussion Papers 1976, Cowles Foundation for Research in Economics, Yale University.
- Chengchun Shi & Rui Song & Wenbin Lu & Bo Fu, 2018. "Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 681-702, September.
- Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
- Lu Tian & Ash A. Alizadeh & Andrew J. Gentles & Robert Tibshirani, 2014. "A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1517-1532, December.
- Shi, Chengchun & Fan, Ailin & Song, Rui & Lu, Wenbin, 2018. "High-dimensional A-learning for optimal dynamic treatment regimes," LSE Research Online Documents on Economics 102113, London School of Economics and Political Science, LSE Library.
- Michael P. Wallace & Erica E. M. Moodie, 2015. "Doubly‐robust dynamic treatment regimen estimation via weighted least squares," Biometrics, The International Biometric Society, vol. 71(3), pages 636-644, September.
- Shi, Chengchun & Song, Rui & Lu, Wenbin & Fu, Bo, 2018. "Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects," LSE Research Online Documents on Economics 102112, London School of Economics and Political Science, LSE Library.
- Ruoqing Zhu & Ying-Qi Zhao & Guanhua Chen & Shuangge Ma & Hongyu Zhao, 2017. "Greedy outcome weighted tree learning of optimal personalized treatment rules," Biometrics, The International Biometric Society, vol. 73(2), pages 391-400, June.
- Ashkan Ertefaie & Robert L Strawderman, 2018. "Constructing dynamic treatment regimes over indefinite time horizons," Biometrika, Biometrika Trust, vol. 105(4), pages 963-977.
- Daniel J. Luckett & Eric B. Laber & Anna R. Kahkoska & David M. Maahs & Elizabeth Mayer-Davis & Michael R. Kosorok, 2020. "Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 692-706, April.
- Zhengling Qi & Dacheng Liu & Haoda Fu & Yufeng Liu, 2020. "Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 678-691, April.
- S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
- Baqun Zhang & Anastasios A. Tsiatis & Eric B. Laber & Marie Davidian, 2013. "Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions," Biometrika, Biometrika Trust, vol. 100(3), pages 681-694.
- Xinkun Nie & Emma Brunskill & Stefan Wager, 2020. "Learning When-to-Treat Policies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 392-409, November.
- Lan Wang & Yu Zhou & Rui Song & Ben Sherwood, 2018. "Quantile-Optimal Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1243-1254, July.
- Ying-Qi Zhao & Donglin Zeng & Eric B. Laber & Michael R. Kosorok, 2015. "New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 583-598, June.
- Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
- Xiaohong Chen & Zhengling Qi, 2022. "On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation," Papers 2201.06169, arXiv.org, revised Jun 2022.
- David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Zhen Li & Jie Chen & Eric Laber & Fang Liu & Richard Baumgartner, 2023. "Optimal Treatment Regimes: A Review and Empirical Comparison," International Statistical Review, International Statistical Institute, vol. 91(3), pages 427-463, December.
- Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
- Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
- Zhou, Yunzhe & Qi, Zhengling & Shi, Chengchun & Li, Lexin, 2023. "Optimizing pessimism in dynamic treatment regimes: a Bayesian learning approach," LSE Research Online Documents on Economics 118233, London School of Economics and Political Science, LSE Library.
- Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
- Cai, Hengrui & Shi, Chengchun & Song, Rui & Lu, Wenbin, 2023. "Jump interval-learning for individualized decision making with continuous treatments," LSE Research Online Documents on Economics 118231, London School of Economics and Political Science, LSE Library.
- Shi, Chengchun & Wan, Runzhe & Song, Ge & Luo, Shikai & Zhu, Hongtu & Song, Rui, 2023. "A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets," LSE Research Online Documents on Economics 117174, London School of Economics and Political Science, LSE Library.
- Baqun Zhang & Min Zhang, 2018. "C‐learning: A new classification framework to estimate optimal dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 74(3), pages 891-899, September.
- Yuqian Zhang & Weijie Ji & Jelena Bradic, 2021. "Dynamic treatment effects: high-dimensional inference under model misspecification," Papers 2111.06818, arXiv.org, revised Jun 2023.
- Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
- Li, Ting & Shi, Chengchun & Wen, Qianglin & Sui, Yang & Qin, Yongli & Lai, Chunbo & Zhu, Hongtu, 2024. "Combining experimental and historical data for policy evaluation," LSE Research Online Documents on Economics 125588, London School of Economics and Political Science, LSE Library.
- Ruoqing Zhu & Ying-Qi Zhao & Guanhua Chen & Shuangge Ma & Hongyu Zhao, 2017. "Greedy outcome weighted tree learning of optimal personalized treatment rules," Biometrics, The International Biometric Society, vol. 73(2), pages 391-400, June.
- Shosei Sakaguchi, 2024. "Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data," Papers 2404.00221, arXiv.org, revised Dec 2024.
- Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
- Pan Zhao & Yifan Cui, 2023. "A Semiparametric Instrumented Difference-in-Differences Approach to Policy Learning," Papers 2310.09545, arXiv.org.
- Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
- Yunan Wu & Lan Wang, 2021. "Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes," Biometrics, The International Biometric Society, vol. 77(2), pages 465-476, June.
- Xiaohong Chen & Yuan Liao & Weichen Wang, 2022. "Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves," Papers 2301.00092, arXiv.org, revised Jan 2023.
- Xin Qiu & Donglin Zeng & Yuanjia Wang, 2018. "Estimation and evaluation of linear individualized treatment rules to guarantee performance," Biometrics, The International Biometric Society, vol. 74(2), pages 517-528, June.
- Weibin Mo & Yufeng Liu, 2022. "Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment‐free effect models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 440-472, April.
More about this item
Keywords
reinforcement learning; advantage learning; infinite horizons; rate of convergence; mobile health applications;All these keywords.
JEL classification:
- C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
NEP fields
This paper has been announced in the following NEP Reports:- NEP-CMP-2023-09-04 (Computational Economics)
- NEP-ECM-2023-09-04 (Econometrics)
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:115598. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.