Statistical inference of the value function for reinforcement learning in infinite‐horizon settings
Author
Abstract
Suggested Citation
DOI: 10.1111/rssb.12465
Download full text from publisher
References listed on IDEAS
- S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
- Baqun Zhang & Anastasios A. Tsiatis & Eric B. Laber & Marie Davidian, 2013. "Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions," Biometrika, Biometrika Trust, vol. 100(3), pages 681-694.
- Chen, Xiaohong & Christensen, Timothy M., 2015.
"Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions,"
Journal of Econometrics, Elsevier, vol. 188(2), pages 447-465.
- Xiaohong Chen & Timothy M. Christensen, 2014. "Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators under Weak Dependence and Weak Conditions," Cowles Foundation Discussion Papers 1976, Cowles Foundation for Research in Economics, Yale University.
- Xiaohong Chen & Timothy M. Christensen, 2014. "Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions," CeMMAP working papers CWP46/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Chengchun Shi & Rui Song & Wenbin Lu & Bo Fu, 2018. "Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 681-702, September.
- Ying-Qi Zhao & Donglin Zeng & Eric B. Laber & Michael R. Kosorok, 2015. "New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 583-598, June.
- Jingshen Wang & Xuming He & Gongjun Xu, 2020. "Debiased Inference on Treatment Effect in a High-Dimensional Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 442-454, January.
- Shi, Chengchun & Song, Rui & Lu, Wenbin & Fu, Bo, 2018. "Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects," LSE Research Online Documents on Economics 102112, London School of Economics and Political Science, LSE Library.
- David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
- Ashkan Ertefaie & Robert L Strawderman, 2018. "Constructing dynamic treatment regimes over indefinite time horizons," Biometrika, Biometrika Trust, vol. 105(4), pages 963-977.
- Saikkonen, Pentti, 2001. "Stability results for nonlinear vector autoregressions with an application to a nonlinear error correction model," SFB 373 Discussion Papers 2001,93, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
Citations
Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
Cited by:
- Shi, Chengchun & Wan, Runzhe & Song, Ge & Luo, Shikai & Zhu, Hongtu & Song, Rui, 2023. "A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets," LSE Research Online Documents on Economics 117174, London School of Economics and Political Science, LSE Library.
- Hao, Meiling & Su, Pingfan & Hu, Liyuan & Szabo, Zoltan & Zhao, Qianyu & Shi, Chengchun, 2024. "Forward and backward state abstractions for off-policy evaluation," LSE Research Online Documents on Economics 124074, London School of Economics and Political Science, LSE Library.
- Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
- Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
- Shi, Chengchun & Luo, Shikai & Le, Yuan & Zhu, Hongtu & Song, Rui, 2022. "Statistically efficient advantage learning for offline reinforcement learning in infinite horizons," LSE Research Online Documents on Economics 115598, London School of Economics and Political Science, LSE Library.
- Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
- Zhou, Yunzhe & Qi, Zhengling & Shi, Chengchun & Li, Lexin, 2023. "Optimizing pessimism in dynamic treatment regimes: a Bayesian learning approach," LSE Research Online Documents on Economics 118233, London School of Economics and Political Science, LSE Library.
- Shi, Chengchun & Wan, Runzhe & Song, Ge & Luo, Shikai & Zhu, Hongtu & Song, Rui, 2023. "A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets," LSE Research Online Documents on Economics 117174, London School of Economics and Political Science, LSE Library.
- Zhen Li & Jie Chen & Eric Laber & Fang Liu & Richard Baumgartner, 2023. "Optimal Treatment Regimes: A Review and Empirical Comparison," International Statistical Review, International Statistical Institute, vol. 91(3), pages 427-463, December.
- Jingxiang Chen & Yufeng Liu & Donglin Zeng & Rui Song & Yingqi Zhao & Michael R. Kosorok, 2016. "Comment," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 942-947, July.
- Xin Qiu & Donglin Zeng & Yuanjia Wang, 2018. "Estimation and evaluation of linear individualized treatment rules to guarantee performance," Biometrics, The International Biometric Society, vol. 74(2), pages 517-528, June.
- Baqun Zhang & Min Zhang, 2018. "C‐learning: A new classification framework to estimate optimal dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 74(3), pages 891-899, September.
- Kristin A. Linn & Eric B. Laber & Leonard A. Stefanski, 2017. "Interactive -Learning for Quantiles," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 638-649, April.
- Li, Ting & Shi, Chengchun & Wen, Qianglin & Sui, Yang & Qin, Yongli & Lai, Chunbo & Zhu, Hongtu, 2024. "Combining experimental and historical data for policy evaluation," LSE Research Online Documents on Economics 125588, London School of Economics and Political Science, LSE Library.
- Rebecca Hager & Anastasios A. Tsiatis & Marie Davidian, 2018. "Optimal two‐stage dynamic treatment regimes from a classification perspective with censored survival data," Biometrics, The International Biometric Society, vol. 74(4), pages 1180-1192, December.
- Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
- Jin Wang & Donglin Zeng & D. Y. Lin, 2022. "Semiparametric single-index models for optimal treatment regimens with censored outcomes," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(4), pages 744-763, October.
- Ruoqing Zhu & Ying-Qi Zhao & Guanhua Chen & Shuangge Ma & Hongyu Zhao, 2017. "Greedy outcome weighted tree learning of optimal personalized treatment rules," Biometrics, The International Biometric Society, vol. 73(2), pages 391-400, June.
- Shosei Sakaguchi, 2021. "Estimation of Optimal Dynamic Treatment Assignment Rules under Policy Constraints," Papers 2106.05031, arXiv.org, revised Aug 2024.
- Shosei Sakaguchi, 2024. "Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data," Papers 2404.00221, arXiv.org.
- Qingxia Chen & Fan Zhang & Ming-Hui Chen & Xiuyu Julie Cong, 2020. "Estimation of treatment effects and model diagnostics with two-way time-varying treatment switching: an application to a head and neck study," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 26(4), pages 685-707, October.
- Wei Liu & Zhiwei Zhang & Lei Nie & Guoxing Soon, 2017. "A Case Study in Personalized Medicine: Rilpivirine Versus Efavirenz for Treatment-Naive HIV Patients," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1381-1392, October.
- Kara E. Rudolph & Iván Díaz, 2022. "When the ends do not justify the means: Learning who is predicted to have harmful indirect effects," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(S2), pages 573-589, December.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:84:y:2022:i:3:p:765-793. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.