Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

My bibliography Save this paper

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Author

Listed:

Yanwei Jia

Registered:

Abstract

This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem.

Suggested Citation

Yanwei Jia, 2024. "Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty," Papers 2404.12598, arXiv.org.

Handle: RePEc:arx:papers:2404.12598

Download full text from publisher

References listed on IDEAS

LeRoy, Stephen F & Singell, Larry D, Jr, 1987. "Knight on Risk and Uncertainty," Journal of Political Economy, University of Chicago Press, vol. 95(2), pages 394-406, April.
Lars Peter Hansen & Thomas J Sargent, 2014. "Robust Control and Model Uncertainty," World Scientific Book Chapters, in: UNCERTAINTY WITHIN ECONOMIC MODELS, chapter 5, pages 145-154, World Scientific Publishing Co. Pte. Ltd..
- Thomas J. Sargent & LarsPeter Hansen, 2001. "Robust Control and Model Uncertainty," American Economic Review, American Economic Association, vol. 91(2), pages 60-66, May.
Paul Glasserman & Xingbo Xu, 2013. "Robust Portfolio Control with Stochastic Factor Dynamics," Operations Research, INFORMS, vol. 61(4), pages 874-893, August.
Pascal J. Maenhout, 2004. "Robust Portfolio Rules and Asset Pricing," The Review of Financial Studies, Society for Financial Studies, vol. 17(4), pages 951-983.
Hansen, Lars Peter & Sargent, Thomas J., 2011. "Robustness and ambiguity in continuous time," Journal of Economic Theory, Elsevier, vol. 146(3), pages 1195-1223, May.
Min Dai & Hanqing Jin & Steven Kou & Yuhong Xu, 2021. "A Dynamic Mean-Variance Analysis for Log Returns," Management Science, INFORMS, vol. 67(2), pages 1093-1108, February.
V. S. Borkar, 2002. "Q-Learning for Risk-Sensitive Control," Mathematics of Operations Research, INFORMS, vol. 27(2), pages 294-311, May.
R. Jiang & D. Saunders & C. Weng, 2022. "The reinforcement learning Kelly strategy," Quantitative Finance, Taylor & Francis Journals, vol. 22(8), pages 1445-1464, August.
Merton, Robert C, 1969. "Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case," The Review of Economics and Statistics, MIT Press, vol. 51(3), pages 247-257, August.
Jose Blanchet & Lin Chen & Xun Yu Zhou, 2022. "Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances," Management Science, INFORMS, vol. 68(9), pages 6382-6410, September.
Gilboa, Itzhak & Schmeidler, David, 1989. "Maxmin expected utility with non-unique prior," Journal of Mathematical Economics, Elsevier, vol. 18(2), pages 141-153, April.
- Gilboa, Itzhak & Schmeidler, David, 1986. "Maxmin Expected Utility with a Non-Unique Prior," Foerder Institute for Economic Research Working Papers 275405, Tel-Aviv University > Foerder Institute for Economic Research.
- Itzhak Gilboa & David Schmeidler, 1989. "Maxmin Expected Utility with Non-Unique Prior," Post-Print hal-00753237, HAL.
Mark Broadie & Deniz Cicek & Assaf Zeevi, 2011. "General Bounds and Finite-Time Improvement for the Kiefer-Wolfowitz Stochastic Approximation Algorithm," Operations Research, INFORMS, vol. 59(5), pages 1211-1224, October.
Duffie, Darrell & Epstein, Larry G, 1992. "Stochastic Differential Utility," Econometrica, Econometric Society, vol. 60(2), pages 353-394, March.
Yanwei Jia & Xun Yu Zhou, 2022. "q-Learning in Continuous Time," Papers 2207.00713, arXiv.org, revised Apr 2023.
Sigrún Andradóttir, 1995. "A Stochastic Approximation Algorithm with Varying Bounds," Operations Research, INFORMS, vol. 43(6), pages 1037-1048, December.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms," Papers 2111.11232, arXiv.org, revised Jul 2022.
Duffie, Darrell & Epstein, Larry G, 1992. "Asset Pricing with Stochastic Differential Utility," The Review of Financial Studies, Society for Financial Studies, vol. 5(3), pages 411-436.
Yanwei Jia & Xun Yu Zhou, 2021. "Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach," Papers 2108.06655, arXiv.org, revised Feb 2022.
Sun, Yeneng, 2006. "The exact law of large numbers via Fubini extension and characterization of insurable risks," Journal of Economic Theory, Elsevier, vol. 126(1), pages 31-69, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Aït-Sahalia, Yacine & Matthys, Felix, 2019. "Robust consumption and portfolio policies when asset prices can jump," Journal of Economic Theory, Elsevier, vol. 179(C), pages 1-56.
Yilie Huang & Yanwei Jia & Xun Yu Zhou, 2024. "Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study," Papers 2412.16175, arXiv.org.
Min Dai & Yuchao Dong & Yanwei Jia & Xun Yu Zhou, 2023. "Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration," Papers 2312.11797, arXiv.org.
Maenhout, Pascal J. & Vedolin, Andrea & Xing, Hao, 2025. "Robustness and dynamic sentiment," Journal of Financial Economics, Elsevier, vol. 163(C).
Shi, Zhan, 2019. "Time-varying ambiguity, credit spreads, and the levered equity premium," Journal of Financial Economics, Elsevier, vol. 134(3), pages 617-646.
Chen Ziyi & Gu Jia-wen, 2025. "Exploratory Utility Maximization Problem with Tsallis Entropy," Papers 2502.01269, arXiv.org.
Massimo Guidolin & Francesca Rinaldi, 2013. "Ambiguity in asset pricing and portfolio choice: a review of the literature," Theory and Decision, Springer, vol. 74(2), pages 183-217, February.
- Massimo Guidolin & Francesca Rinaldi, 2010. "Ambiguity in asset pricing and portfolio choice: a review of the literature," Working Papers 2010-028, Federal Reserve Bank of St. Louis.
- Massimo Guidolin & Francesca Rinaldi, 2011. "Ambiguity in Asset Pricing and Portfolio Choice: A Review of the Literature," Working Papers 417, IGIER (Innocenzo Gasparini Institute for Economic Research), Bocconi University.
Zhang, Jinqing & Jin, Zeyu & An, Yunbi, 2017. "Dynamic portfolio optimization with ambiguity aversion," Journal of Banking & Finance, Elsevier, vol. 79(C), pages 95-109.
Jang, Bong-Gyu & Lee, Seungkyu & Lim, Byung Hwa, 2016. "Robust consumption and portfolio rules with time-varying model confidence," Finance Research Letters, Elsevier, vol. 18(C), pages 342-352.
Berend Roorda & J. M. Schumacher & Jacob Engwerda, 2005. "Coherent Acceptability Measures In Multiperiod Models," Mathematical Finance, Wiley Blackwell, vol. 15(4), pages 589-612, October.
Roger J. A. Laeven & Mitja Stadje, 2014. "Robust Portfolio Choice and Indifference Valuation," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1109-1141, November.
Wei, Pengyu & Yang, Charles & Zhuang, Yi, 2023. "Robust consumption and portfolio choice with derivatives trading," European Journal of Operational Research, Elsevier, vol. 304(2), pages 832-850.
Dejian Tian & Weidong Tian, 2016. "Comparative statics under κ-ambiguity for log-Brownian asset prices," International Journal of Economic Theory, The International Society for Economic Theory, vol. 12(4), pages 361-378, December.
Maenhout, Pascal J., 2006. "Robust portfolio rules and detection-error probabilities for a mean-reverting risk premium," Journal of Economic Theory, Elsevier, vol. 128(1), pages 136-163, May.
Pascal J. Maenhout & Andrea Vedolin & Hao Xing, 2020. "Generalized Robustness and Dynamic Pessimism," NBER Working Papers 26970, National Bureau of Economic Research, Inc.
Isaac Kleshchelski & Nicolas Vincent, 2007. "Robust Equilibrium Yield Curves," Cahiers de recherche 08-02, HEC Montréal, Institut d'économie appliquée.
- Isaac Kleshchelski & Nicolas Vincent, 2009. "Robust Equilibrium Yield Curves," Cahiers de recherche 0907, CIRPEE.
- Nicolas Vincent & Isaac Kleshchelski, 2008. "Robust Equilibrium Yield Curves," 2008 Meeting Papers 486, Society for Economic Dynamics.
Hui Chen & Nengjiu Ju & Jianjun Miao, 2014. "Dynamic Asset Allocation with Ambiguous Return Predictability," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 17(4), pages 799-823, October.
- Hui Chen & Nengjiu Ju & Jianjun Miao, "undated". "Dynamic Asset Allocation with Ambiguous Return Predictability," Boston University - Department of Economics - Working Papers Series wp2009-015, Boston University - Department of Economics.
- Hui Chen & Nengjiu Ju & Jianjun Miao, 2008. "Dynamic Asset Allocation with Ambiguous Return Predictability," Boston University - Department of Economics - The Institute for Economic Development Working Papers Series dp-179, Boston University - Department of Economics, revised Feb 2009.
Yacine Aït-Sahalia & Felix Matthys & Emilio Osambela & Ronnie Sircar, 2021. "When Uncertainty and Volatility Are Disconnected: Implications for Asset Pricing and Portfolio Performance," NBER Working Papers 29195, National Bureau of Economic Research, Inc.
- Yacine Aït-Sahalia & Felix Matthys & Emilio Osambela & Ronnie Sircar, 2021. "When Uncertainty and Volatility Are Disconnected: Implications for Asset Pricing and Portfolio Performance," Finance and Economics Discussion Series 2021-063, Board of Governors of the Federal Reserve System (U.S.).
Huy Chau & Duy Nguyen & Thai Nguyen, 2024. "Continuous-time optimal investment with portfolio constraints: a reinforcement learning approach," Papers 2412.10692, arXiv.org.
Shigeta, Yuki, 2020. "Gain/loss asymmetric stochastic differential utility," Journal of Economic Dynamics and Control, Elsevier, vol. 118(C).
- Yuki SHIGETA, 2019. "Gain/Loss Asymmetric Stochastic Differential Utility," Discussion papers e-19-004, Graduate School of Economics , Kyoto University.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2024-05-20 (Computational Economics)
NEP-RMG-2024-05-20 (Risk Management)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2404.12598. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data