IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2007.04203.html
   My bibliography  Save this paper

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Author

Listed:
  • Thomas Spooner
  • Rahul Savani

Abstract

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.

Suggested Citation

  • Thomas Spooner & Rahul Savani, 2020. "A Natural Actor-Critic Algorithm with Downside Risk Constraints," Papers 2007.04203, arXiv.org.
  • Handle: RePEc:arx:papers:2007.04203
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2007.04203
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Farinelli, Simone & Tibiletti, Luisa, 2008. "Sharpe thinking in asset ranking with one-sided measures," European Journal of Operational Research, Elsevier, vol. 185(3), pages 1542-1547, March.
    2. Thomas Spooner & Rahul Savani, 2020. "Robust Market Making via Adversarial Reinforcement Learning," Papers 2003.01820, arXiv.org, revised Jul 2020.
    3. Fishburn, Peter C, 1977. "Mean-Risk Analysis with Risk Associated with Below-Target Returns," American Economic Review, American Economic Association, vol. 67(2), pages 116-126, March.
    4. Danielsson, Jon & Zigrand, Jean-Pierre & Jorgensen, Bjørn N. & Sarma, Mandira & de Vries, C. G., 2006. "Consistent measures of risk," LSE Research Online Documents on Economics 24517, London School of Economics and Political Science, LSE Library.
    5. Daniel Kahneman & Amos Tversky, 2013. "Prospect Theory: An Analysis of Decision Under Risk," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 6, pages 99-127, World Scientific Publishing Co. Pte. Ltd..
    6. Mannor, Shie & Tsitsiklis, John N., 2013. "Algorithmic aspects of mean–variance optimization in Markov decision processes," European Journal of Operational Research, Elsevier, vol. 231(3), pages 645-653.
    7. Merton, Robert C, 1969. "Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case," The Review of Economics and Statistics, MIT Press, vol. 51(3), pages 247-257, August.
    8. Shalabh Bhatnagar & K. Lakshmanan, 2012. "An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes," Journal of Optimization Theory and Applications, Springer, vol. 153(3), pages 688-708, June.
    9. S. M. Sunoj & N. Vipin, 2019. "Some properties of conditional partial moments in the context of stochastic modelling," Statistical Papers, Springer, vol. 60(6), pages 1971-1999, December.
    10. Markowitz, Harry M, 1991. "Foundations of Portfolio Theory," Journal of Finance, American Finance Association, vol. 46(2), pages 469-477, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Caporin, Massimiliano & Costola, Michele & Jannin, Gregory & Maillet, Bertrand, 2018. "“On the (Ab)use of Omega?”," Journal of Empirical Finance, Elsevier, vol. 46(C), pages 11-33.
    2. Elie Matta & Jean McGuire, 2008. "Too Risky to Hold? The Effect of Downside Risk, Accumulated Equity Wealth, and Firm Performance on CEO Equity Reduction," Organization Science, INFORMS, vol. 19(4), pages 567-580, August.
    3. Veld, Chris & Veld-Merkoulova, Yulia V., 2008. "The risk perceptions of individual investors," Journal of Economic Psychology, Elsevier, vol. 29(2), pages 226-252, April.
    4. Briec, Walter & Kerstens, Kristiaan, 2010. "Portfolio selection in multidimensional general and partial moment space," Journal of Economic Dynamics and Control, Elsevier, vol. 34(4), pages 636-656, April.
    5. León, Angel & Moreno, Manuel, 2017. "One-sided performance measures under Gram-Charlier distributions," Journal of Banking & Finance, Elsevier, vol. 74(C), pages 38-50.
    6. Farinelli, Simone & Ferreira, Manuel & Rossello, Damiano & Thoeny, Markus & Tibiletti, Luisa, 2008. "Beyond Sharpe ratio: Optimal asset allocation using different performance ratios," Journal of Banking & Finance, Elsevier, vol. 32(10), pages 2057-2063, October.
    7. Dong, Yinghui & Zheng, Harry, 2019. "Optimal investment of DC pension plan under short-selling constraints and portfolio insurance," Insurance: Mathematics and Economics, Elsevier, vol. 85(C), pages 47-59.
    8. John Heaton & Deborah Lucas, 2000. "Stock prices and fundamentals," Proceedings, Federal Reserve Bank of San Francisco, issue Apr.
    9. Luo, Yan & Wang, Xiaohuan & Zhang, Chenyang & Huang, Wei, 2021. "Accounting-based downside risk and expected stock returns: Evidence from China," International Review of Financial Analysis, Elsevier, vol. 78(C).
    10. Massimiliano Caporin & Grégory M. Jannin & Francesco Lisi & Bertrand B. Maillet, 2014. "A Survey On The Four Families Of Performance Measures," Journal of Economic Surveys, Wiley Blackwell, vol. 28(5), pages 917-942, December.
    11. Di Giacinto, Marina & Federico, Salvatore & Gozzi, Fausto & Vigna, Elena, 2014. "Income drawdown option with minimum guarantee," European Journal of Operational Research, Elsevier, vol. 234(3), pages 610-624.
    12. Jakusch, Sven Thorsten, 2017. "On the applicability of maximum likelihood methods: From experimental to financial data," SAFE Working Paper Series 148, Leibniz Institute for Financial Research SAFE, revised 2017.
    13. LiCalzi, Marco & Sorato, Annamaria, 2006. "The Pearson system of utility functions," European Journal of Operational Research, Elsevier, vol. 172(2), pages 560-573, July.
    14. Penikas, Henry, 2010. "Copula-Models in Foreign Exchange Risk-Management of a Bank," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 17(1), pages 62-87.
    15. Noriyuki Okuyama & Gavin Francis, 2010. "Revealing the Information Content of Investment Decisions," Chapters, in: Brian Bruce (ed.), Handbook of Behavioral Finance, chapter 3, Edward Elgar Publishing.
    16. Bi, Hongwei & Huang, Rachel J. & Tzeng, Larry Y. & Zhu, Wei, 2019. "Higher-order Omega: A performance index with a decision-theoretic foundation," Journal of Banking & Finance, Elsevier, vol. 100(C), pages 43-57.
    17. Hlouskova, Jaroslava & Fortin, Ines & Tsigaris, Panagiotis, 2019. "The consumption–investment decision of a prospect theory household: A two-period model with an endogenous second period reference level," Journal of Mathematical Economics, Elsevier, vol. 85(C), pages 93-108.
    18. Dacey, Raymond & Gallant, Kenneth S., 1997. "Crime control and harassment of the innocent," Journal of Criminal Justice, Elsevier, vol. 25(4), pages 325-334.
    19. Basu, Anup K. & Drew, Michael E., 2010. "The appropriateness of default investment options in defined contribution plans: Australian evidence," Pacific-Basin Finance Journal, Elsevier, vol. 18(3), pages 290-305, June.
    20. Kuhberger, Anton, 1998. "The Influence of Framing on Risky Decisions: A Meta-analysis," Organizational Behavior and Human Decision Processes, Elsevier, vol. 75(1), pages 23-55, July.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2007.04203. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.