Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Author
Abstract
Suggested Citation
Download full text from publisher
References listed on IDEAS
- R. H. Strotz, 1955. "Myopia and Inconsistency in Dynamic Utility Maximization," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 23(3), pages 165-180.
- Duan Li & Wan‐Lung Ng, 2000. "Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation," Mathematical Finance, Wiley Blackwell, vol. 10(3), pages 387-406, July.
- Min Dai & Hanqing Jin & Steven Kou & Yuhong Xu, 2021. "A Dynamic Mean-Variance Analysis for Log Returns," Management Science, INFORMS, vol. 67(2), pages 1093-1108, February.
- Suleyman Basak & Georgy Chabakauri, 2010.
"Dynamic Mean-Variance Asset Allocation,"
The Review of Financial Studies, Society for Financial Studies, vol. 23(8), pages 2970-3016, August.
- Basak, Suleyman & Chabakauri, Georgy, 2009. "Dynamic Mean-Variance Asset Allocation," CEPR Discussion Papers 7256, C.E.P.R. Discussion Papers.
- David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
- Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
Citations
Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
Cited by:
- Zhou Fang, 2023. "Continuous-Time Path-Dependent Exploratory Mean-Variance Portfolio Construction," Papers 2303.02298, arXiv.org.
- Wu, Bo & Li, Lingfei, 2024. "Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market," Journal of Economic Dynamics and Control, Elsevier, vol. 158(C).
- Zhou Fang & Haiqing Xu, 2023. "Market Making of Options via Reinforcement Learning," Papers 2307.01814, arXiv.org.
- Zhou Fang & Haiqing Xu, 2023. "Over-the-Counter Market Making via Reinforcement Learning," Papers 2307.01816, arXiv.org.
- Yanwei Jia, 2024. "Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty," Papers 2404.12598, arXiv.org.
- Jodi Dianetti & Giorgio Ferrari & Renyuan Xu, 2024. "Exploratory Optimal Stopping: A Singular Control Formulation," Papers 2408.09335, arXiv.org, revised Oct 2024.
- Xiangyu Cui & Xun Li & Yun Shi & Si Zhao, 2023. "Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning," Papers 2312.15385, arXiv.org.
- Min Dai & Yu Sun & Zuo Quan Xu & Xun Yu Zhou, 2024. "Learning to Optimally Stop Diffusion Processes, with Financial Applications," Papers 2408.09242, arXiv.org, revised Sep 2024.
- Yanwei Jia & Xun Yu Zhou, 2022. "q-Learning in Continuous Time," Papers 2207.00713, arXiv.org, revised Apr 2023.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- De Gennaro Aquino, Luca & Sornette, Didier & Strub, Moris S., 2023. "Portfolio selection with exploration of new investment assets," European Journal of Operational Research, Elsevier, vol. 310(2), pages 773-792.
- Xiang Meng, 2019. "Dynamic Mean-Variance Portfolio Optimisation," Papers 1907.03093, arXiv.org.
- Xiangyu Cui & Xun Li & Duan Li & Yun Shi, 2014. "Time Consistent Behavior Portfolio Policy for Dynamic Mean-Variance Formulation," Papers 1408.6070, arXiv.org, revised Aug 2015.
- Ben Hambly & Renyuan Xu & Huining Yang, 2021. "Recent Advances in Reinforcement Learning in Finance," Papers 2112.04553, arXiv.org, revised Feb 2023.
- Li, Yongwu & Li, Zhongfei, 2013. "Optimal time-consistent investment and reinsurance strategies for mean–variance insurers with state dependent risk aversion," Insurance: Mathematics and Economics, Elsevier, vol. 53(1), pages 86-97.
- Felix Fie{ss}inger & Mitja Stadje, 2023. "Time-Consistent Asset Allocation for Risk Measures in a L\'evy Market," Papers 2305.09471, arXiv.org, revised Oct 2024.
- Xue Dong He & Xun Yu Zhou, 2021. "Who Are I: Time Inconsistency and Intrapersonal Conflict and Reconciliation," Papers 2105.01829, arXiv.org.
- Agostino Capponi & Sveinn Ólafsson & Thaleia Zariphopoulou, 2022. "Personalized Robo-Advising: Enhancing Investment Through Client Interaction," Management Science, INFORMS, vol. 68(4), pages 2485-2512, April.
- Ma, Shuai & Ma, Xiaoteng & Xia, Li, 2023. "A unified algorithm framework for mean-variance optimization in discounted Markov decision processes," European Journal of Operational Research, Elsevier, vol. 311(3), pages 1057-1067.
- Dong-Mei Zhu & Jia-Wen Gu & Feng-Hui Yu & Tak-Kuen Siu & Wai-Ki Ching, 2021. "Optimal pairs trading with dynamic mean-variance objective," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 94(1), pages 145-168, August.
- Tomas Björk & Agatha Murgoci & Xun Yu Zhou, 2014. "Mean–Variance Portfolio Optimization With State-Dependent Risk Aversion," Mathematical Finance, Wiley Blackwell, vol. 24(1), pages 1-24, January.
- Keffert, Henk, 2024. "Robo-advising: Optimal investment with mismeasured and unstable risk preferences," European Journal of Operational Research, Elsevier, vol. 315(1), pages 378-392.
- Wei, Jiaqin & Wang, Tianxiao, 2017. "Time-consistent mean–variance asset–liability management with random coefficients," Insurance: Mathematics and Economics, Elsevier, vol. 77(C), pages 84-96.
- Yuchen Li & Zongxia Liang & Shunzhi Pang, 2022. "Continuous-Time Monotone Mean-Variance Portfolio Selection in Jump-Diffusion Model," Papers 2211.12168, arXiv.org, revised May 2024.
- Chi Kin Lam & Yuhong Xu & Guosheng Yin, 2016. "Dynamic portfolio selection without risk-free assets," Papers 1602.04975, arXiv.org.
- Zhang, Jingong & Tan, Ken Seng & Weng, Chengguo, 2017. "Optimal hedging with basis risk under mean–variance criterion," Insurance: Mathematics and Economics, Elsevier, vol. 75(C), pages 1-15.
- Zhou Fang, 2023. "Continuous-Time Path-Dependent Exploratory Mean-Variance Portfolio Construction," Papers 2303.02298, arXiv.org.
- Luca De Gennaro Aquino & Sascha Desmettre & Yevhen Havrylenko & Mogens Steffensen, 2024. "Equilibrium control theory for Kihlstrom-Mirman preferences in continuous time," Papers 2407.16525, arXiv.org, revised Oct 2024.
- Fahrenwaldt, Matthias Albrecht & Jensen, Ninna Reitzel & Steffensen, Mogens, 2020. "Nonrecursive separation of risk and time preferences," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 95-108.
- Liyuan Wang & Zhiping Chen, 2019. "Stochastic Game Theoretic Formulation for a Multi-Period DC Pension Plan with State-Dependent Risk Aversion," Mathematics, MDPI, vol. 7(1), pages 1-16, January.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2111.11232. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.