IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i24p4020-d1549683.html
   My bibliography  Save this article

A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization

Author

Listed:
  • Yuling Huang

    (School of Computer Science and Software, Zhaoqing University, Zhaoqing 526060, China)

  • Chujin Zhou

    (School of Computer Science and Engineering, Macau University of Science and Technology, Taipa 999078, Macao, China)

  • Lin Zhang

    (School of Accounting and Finance, Beijing Institute of Technology, Beijing 100811, China)

  • Xiaoping Lu

    (School of Computer Science and Engineering, Macau University of Science and Technology, Taipa 999078, Macao, China)

Abstract

Reinforcement Learning (RL) is increasingly being applied to complex decision-making tasks such as financial trading. However, designing effective reward functions remains a significant challenge. Traditional static reward functions often fail to adapt to dynamic environments, leading to inefficiencies in learning. This paper presents a novel approach, called Self-Rewarding Deep Reinforcement Learning (SRDRL), which integrates a self-rewarding network within the RL framework. The SRDRL mechanism operates in two primary phases: First, supervised learning techniques are used to learn from expert knowledge by employing advanced time-series feature extraction models, including TimesNet and WFTNet. This step refines the self-rewarding network parameters by comparing predicted rewards with expert-labeled rewards, which are based on metrics such as Min-Max, Sharpe Ratio, and Return. In the second phase, the model selects the higher value between the expert-labeled and predicted rewards as the RL reward, storing it in the replay buffer. This combination of expert knowledge and predicted rewards enhances the performance of trading strategies. The proposed implementation, called Self-Rewarding Double DQN (SRDDQN), demonstrates that the self-rewarding mechanism improves learning and optimizes trading decisions. Experiments conducted on datasets including DJI, IXIC, and SP500 show that SRDDQN achieves a cumulative return of 1124.23% on the IXIC dataset, significantly outperforming the next best method, Fire (DQN-HER), which achieved 51.87%. SRDDQN also enhances the stability and efficiency of trading strategies, providing notable improvements over traditional RL methods. The integration of a self-rewarding mechanism within RL addresses a critical limitation in reward function design and offers a scalable, adaptable solution for complex, dynamic trading environments.

Suggested Citation

  • Yuling Huang & Chujin Zhou & Lin Zhang & Xiaoping Lu, 2024. "A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization," Mathematics, MDPI, vol. 12(24), pages 1-25, December.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:24:p:4020-:d:1549683
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/24/4020/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/24/4020/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Souradeep Chakraborty, 2019. "Capturing Financial markets to apply Deep Reinforcement Learning," Papers 1907.04373, arXiv.org, revised Dec 2019.
    2. Terrence Hendershott & Charles M. Jones & Albert J. Menkveld, 2011. "Does Algorithmic Trading Improve Liquidity?," Journal of Finance, American Finance Association, vol. 66(1), pages 1-33, February.
    3. Yuling Huang & Xiaoping Lu & Chujin Zhou & Yunlin Song, 2023. "DADE-DQN: Dual Action and Dual Environment Deep Q-Network for Enhancing Stock Trading Strategy," Mathematics, MDPI, vol. 11(17), pages 1-27, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benos, Evangelos & Sagade, Satchit, 2012. "High-frequency trading behaviour and its impact on market quality: evidence from the UK equity market," Bank of England working papers 469, Bank of England.
    2. Bellia, Mario & Christensen, Kim & Kolokolov, Aleksey & Pelizzon, Loriana & Renò, Roberto, 2022. "Do designated market makers provide liquidity during a flash crash?," SAFE Working Paper Series 270, Leibniz Institute for Financial Research SAFE, revised 2022.
    3. Tamer Khraisha & Keren Arthur, 2018. "Can we have a general theory of financial innovation processes? A conceptual review," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 4(1), pages 1-27, December.
    4. Bruno Biais & Fany Declerck & Sophie Moinas, 2016. "Who supplies liquidity, how and when?," BIS Working Papers 563, Bank for International Settlements.
    5. Indriawan, Ivan & Martinez, Valeria & Tse, Yiuman, 2021. "The impact of the change in USDA announcement release procedures on agricultural commodity futures," Journal of Commodity Markets, Elsevier, vol. 23(C).
    6. Álvaro Cartea & José Penalva, 2012. "Where is the Value in High Frequency Trading?," Quarterly Journal of Finance (QJF), World Scientific Publishing Co. Pte. Ltd., vol. 2(03), pages 1-46.
    7. NIdhi Aggarwal & Venkatesh Panchapagesan & Susan Thomas, 2022. "When is the Order to Trade Ratio fee effective?," Working Papers 8, xKDR.
    8. Kang, Jongho & Kang, Jangkoo & Kwon, Kyung Yoon, 2022. "Market versus limit orders of speculative high-frequency traders and price discovery," Research in International Business and Finance, Elsevier, vol. 63(C).
    9. Ahmed Baig & Nasim Sabah & Drew Winters, 2019. "Have Stock Prices become more Uniformly Distributed?," Economics Bulletin, AccessEcon, vol. 39(2), pages 1242-1250.
    10. Robert J. Kauffman & Yuzhou Hu & Dan Ma, 2015. "Will high-frequency trading practices transform the financial markets in the Asia Pacific Region?," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 1(1), pages 1-27, December.
    11. Uctum, Remzi & Renou-Maissant, Patricia & Prat, Georges & Lecarpentier-Moyal, Sylvie, 2017. "Persistence of announcement effects on the intraday volatility of stock returns: Evidence from individual data," Review of Financial Economics, Elsevier, vol. 35(C), pages 43-56.
    12. George Jiang & Ingrid Lo & Giorgio Valente, 2014. "High-Frequency Trading around Macroeconomic News Announcements: Evidence from the U.S. Treasury Market," Staff Working Papers 14-56, Bank of Canada.
    13. Aggarwal, Nidhi & Panchapagesan, Venkatesh & Thomas, Susan, 2023. "When is the order-to-trade ratio fee effective?," Journal of Financial Markets, Elsevier, vol. 62(C).
    14. Bank, Matthias & Baumann, Ralf H., 2016. "Price formation, market quality and the effects of reduced latency in the very short run," Research in International Business and Finance, Elsevier, vol. 37(C), pages 629-645.
    15. Jing Nie & Juliana Malagon & Julian Williams, 2022. "The impact of high speed quoting on execution risk dynamics: Evidence from interest rate futures markets," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 42(8), pages 1434-1465, August.
    16. Seddon, Jonathan J.J.M. & Currie, Wendy L., 2017. "A model for unpacking big data analytics in high-frequency trading," Journal of Business Research, Elsevier, vol. 70(C), pages 300-307.
    17. Comerton-Forde, Carole & Grégoire, Vincent & Zhong, Zhuo, 2019. "Inverted fee structures, tick size, and market quality," Journal of Financial Economics, Elsevier, vol. 134(1), pages 141-164.
    18. Roseman, Brian S. & Van Ness, Bonnie F. & Van Ness, Robert A., 2018. "Odd-lot trading in U.S. equities," The Quarterly Review of Economics and Finance, Elsevier, vol. 69(C), pages 125-133.
    19. Linnenluecke, Martina K. & Chen, Xiaoyan & Ling, Xin & Smith, Tom & Zhu, Yushu, 2017. "Research in finance: A review of influential publications and a research agenda," Pacific-Basin Finance Journal, Elsevier, vol. 43(C), pages 188-199.
    20. Gousgounis, Eleni & Onur, Esen, 2018. "The effect of pit closure on futures trading," Journal of Commodity Markets, Elsevier, vol. 10(C), pages 69-90.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:24:p:4020-:d:1549683. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.