IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2007.01623.html
   My bibliography  Save this paper

Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning

Author

Listed:
  • Loris Cannelli
  • Giuseppe Nuti
  • Marzio Sala
  • Oleg Szehr

Abstract

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but it is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention was given to Recurrent Neural Network systems and variations of the $Q$-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent was trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual $k$-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture. This allows for realistic online model updates from real-world data. We find that the $k$-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than $Q$-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

Suggested Citation

  • Loris Cannelli & Giuseppe Nuti & Marzio Sala & Oleg Szehr, 2020. "Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning," Papers 2007.01623, arXiv.org, revised Feb 2022.
  • Handle: RePEc:arx:papers:2007.01623
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2007.01623
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Merton, Robert C, 1973. "An Intertemporal Capital Asset Pricing Model," Econometrica, Econometric Society, vol. 41(5), pages 867-887, September.
    2. Leland, Hayne E, 1985. "Option Pricing and Replication with Transactions Costs," Journal of Finance, American Finance Association, vol. 40(5), pages 1283-1301, December.
    3. Hans Buehler & Lukas Gonon & Josef Teichmann & Ben Wood & Baranidharan Mohan & Jonathan Kochems, 2019. "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning," Swiss Finance Institute Research Paper Series 19-80, Swiss Finance Institute.
    4. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    5. Michael N. Katehakis & Arthur F. Veinott, 1987. "The Multi-Armed Bandit Problem: Decomposition and Computation," Mathematics of Operations Research, INFORMS, vol. 12(2), pages 262-268, May.
    6. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cui, Tianxiang & Du, Nanjiang & Yang, Xiaoying & Ding, Shusheng, 2024. "Multi-period portfolio optimization using a deep reinforcement learning hyper-heuristic approach," Technological Forecasting and Social Change, Elsevier, vol. 198(C).
    2. Oleg Szehr, 2021. "Hedging of Financial Derivative Contracts via Monte Carlo Tree Search," Papers 2102.06274, arXiv.org, revised Apr 2021.
    3. Yuchen Zhang & Wei Yang, 2022. "Breakthrough invention and problem complexity: Evidence from a quasi‐experiment," Strategic Management Journal, Wiley Blackwell, vol. 43(12), pages 2510-2544, December.
    4. Omar Al-Ani & Sanjoy Das, 2022. "Reinforcement Learning: Theory and Applications in HEMS," Energies, MDPI, vol. 15(17), pages 1-37, September.
    5. Suresh M. Sundaresan, 2000. "Continuous‐Time Methods in Finance: A Review and an Assessment," Journal of Finance, American Finance Association, vol. 55(4), pages 1569-1622, August.
    6. Zhang, Yihao & Chai, Zhaojie & Lykotrafitis, George, 2021. "Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 571(C).
    7. Jun Li & Wei Zhu & Jun Wang & Wenfei Li & Sheng Gong & Jian Zhang & Wei Wang, 2018. "RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-18, November.
    8. Keller, Alexander & Dahm, Ken, 2019. "Integral equations and machine learning," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 161(C), pages 2-12.
    9. Haoran Wang & Shi Yu, 2021. "Robo-Advising: Enhancing Investment with Inverse Optimization and Deep Reinforcement Learning," Papers 2105.09264, arXiv.org.
    10. Weifan Long & Taixian Hou & Xiaoyi Wei & Shichao Yan & Peng Zhai & Lihua Zhang, 2023. "A Survey on Population-Based Deep Reinforcement Learning," Mathematics, MDPI, vol. 11(10), pages 1-17, May.
    11. Yifeng Guo & Xingyu Fu & Yuyan Shi & Mingwen Liu, 2018. "Robust Log-Optimal Strategy with Reinforcement Learning," Papers 1805.00205, arXiv.org.
    12. Xueqing Yan & Yongming Li, 2023. "A Novel Discrete Differential Evolution with Varying Variables for the Deficiency Number of Mahjong Hand," Mathematics, MDPI, vol. 11(9), pages 1-21, May.
    13. Pujin Wang & Jianzhuang Xiao & Ken’ichi Kawaguchi & Lichen Wang, 2022. "Automatic Ceiling Damage Detection in Large-Span Structures Based on Computer Vision and Deep Learning," Sustainability, MDPI, vol. 14(6), pages 1-24, March.
    14. Jianjun Chen & Weihao Hu & Di Cao & Bin Zhang & Qi Huang & Zhe Chen & Frede Blaabjerg, 2019. "An Imbalance Fault Detection Algorithm for Variable-Speed Wind Turbines: A Deep Learning Approach," Energies, MDPI, vol. 12(14), pages 1-15, July.
    15. Lu Wang & Wenqing Ai & Tianhu Deng & Zuo‐Jun M. Shen & Changjing Hong, 2020. "Optimal production ramp‐up in the smartphone manufacturing industry," Naval Research Logistics (NRL), John Wiley & Sons, vol. 67(8), pages 685-704, December.
    16. Yuchao Dong, 2022. "Randomized Optimal Stopping Problem in Continuous time and Reinforcement Learning Algorithm," Papers 2208.02409, arXiv.org, revised Sep 2023.
    17. Shijun Wang & Baocheng Zhu & Chen Li & Mingzhe Wu & James Zhang & Wei Chu & Yuan Qi, 2020. "Riemannian Proximal Policy Optimization," Computer and Information Science, Canadian Center of Science and Education, vol. 13(3), pages 1-93, August.
    18. Zhenchong Mo & Lin Gong & Mingren Zhu & Junde Lan, 2024. "The Generative Generic-Field Design Method Based on Design Cognition and Knowledge Reasoning," Sustainability, MDPI, vol. 16(22), pages 1-34, November.
    19. Morato, P.G. & Andriotis, C.P. & Papakonstantinou, K.G. & Rigo, P., 2023. "Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning," Reliability Engineering and System Safety, Elsevier, vol. 235(C).
    20. Dimitris Bertsimas & Leonid Kogan & Andrew W. Lo, 2001. "Hedging Derivative Securities and Incomplete Markets: An (epsilon)-Arbitrage Approach," Operations Research, INFORMS, vol. 49(3), pages 372-397, June.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2007.01623. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.