IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/124144.html
   My bibliography  Save this paper

To switch or not to switch? Balanced policy switching in offline reinforcement learning

Author

Listed:
  • Ma, Tao
  • Yang, Xuzhi
  • Szabo, Zoltan

Abstract

Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which incurs a non-negligible cost (examples include the shifting of the currently applied educational technology, modernization of a computing cluster, and the introduction of a new webpage design), and in the decision one is limited to using historical data without the availability for further online interaction. Despite the inevitable importance of this offline learning scenario, to our best knowledge, very little effort has been made to tackle the key problem of balancing between the gain and the cost of switching in a flexible and principled way. Leveraging ideas from the area of optimal transport, we initialize the systematic study of policy switching in offline RL. We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation. Numerical experiments demonstrate the efficiency of our approach on multiple benchmarks of the Gymnasium.

Suggested Citation

  • Ma, Tao & Yang, Xuzhi & Szabo, Zoltan, 2024. "To switch or not to switch? Balanced policy switching in offline reinforcement learning," LSE Research Online Documents on Economics 124144, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:124144
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/124144/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tore Nilssen, 1992. "Two Kinds of Consumer Switching Costs," RAND Journal of Economics, The RAND Corporation, vol. 23(4), pages 579-589, Winter.
    2. Lynn M. LoPucki & Joseph W. Doherty, 2004. "The Determinants of Professional Fees in Large Bankruptcy Reorganization Cases," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 1(1), pages 111-141, March.
    3. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Annabi, Amira & Breton, Michèle & François, Pascal, 2012. "Resolution of financial distress under Chapter 11," Journal of Economic Dynamics and Control, Elsevier, vol. 36(12), pages 1867-1887.
    2. Lam, W., 2015. "Switching Costs in Two-sided Markets," LIDAM Discussion Papers CORE 2015024, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    3. Arturo Bris & Alan Schwartz & Ivo Welch, 2005. "Who Should Pay for Bankruptcy Costs?," The Journal of Legal Studies, University of Chicago Press, vol. 34(2), pages 295-341, June.
    4. Daníelsson, Jón & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," Journal of Banking & Finance, Elsevier, vol. 140(C).
    5. Zhang, Xi & Wang, Qin & Bi, Xiaowen & Li, Donghong & Liu, Dong & Yu, Yuanjin & Tse, Chi Kong, 2024. "Mitigating cascading failure in power grids with deep reinforcement learning-based remedial actions," Reliability Engineering and System Safety, Elsevier, vol. 250(C).
    6. Thamayanthi Chellathurai, 2017. "Probability Density Of Recovery Rate Given Default Of A Firm’S Debt And Its Constituent Tranches," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 20(04), pages 1-34, June.
    7. Stefano Colombo, 2018. "Behavior‐ and characteristic‐based price discrimination," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(2), pages 237-250, June.
    8. Adnan Jafar & Alessandra Kobayati & Michael A. Tsoukas & Ahmad Haidar, 2024. "Personalized insulin dosing using reinforcement learning for high-fat meals and aerobic exercises in type 1 diabetes: a proof-of-concept trial," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    9. Andrei Hagiu & Julian Wright, 2023. "Data‐enabled learning, network effects, and competitive advantage," RAND Journal of Economics, RAND Corporation, vol. 54(4), pages 638-667, December.
    10. Yang, Zhengzhi & Zheng, Lei & Perc, Matjaž & Li, Yumeng, 2024. "Interaction state Q-learning promotes cooperation in the spatial prisoner's dilemma game," Applied Mathematics and Computation, Elsevier, vol. 463(C).
    11. Rohan Pitchford & Mark L. J. Wright, 2012. "Holdouts in Sovereign Debt Restructuring: A Theory of Negotiation in a Weak Contractual Environment," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 79(2), pages 812-837.
    12. Zhang, Yihao & Chai, Zhaojie & Lykotrafitis, George, 2021. "Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 571(C).
    13. Ruqu Wang & Quan Wen, 1998. "Strategic Invasion in Markets with Switching Costs," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 7(4), pages 521-549, December.
    14. Keller, Alexander & Dahm, Ken, 2019. "Integral equations and machine learning," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 161(C), pages 2-12.
    15. Nogata, Daisuke, 2022. "Determinants of household switching between natural gas suppliers: Evidence from Japan," Utilities Policy, Elsevier, vol. 76(C).
    16. Canhoto, Ana Isabel & Clear, Fintan, 2020. "Artificial intelligence and machine learning as business tools: A framework for diagnosing value destruction potential," Business Horizons, Elsevier, vol. 63(2), pages 183-193.
    17. Zhaobin Mo & Xuan Di & Rongye Shi, 2023. "Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection," Games, MDPI, vol. 14(1), pages 1-13, January.
    18. Bouckaert, J.M.C. & Degryse, H.A., 2002. "Softening Competition by Enhancing entry : An Example from the Banking Industry," Other publications TiSEM 1cf58bbb-25a9-4e6e-a11f-8, Tilburg University, School of Economics and Management.
    19. Yang, Kaiyuan & Huang, Houjing & Vandans, Olafs & Murali, Adithya & Tian, Fujia & Yap, Roland H.C. & Dai, Liang, 2023. "Applying deep reinforcement learning to the HP model for protein structure prediction," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    20. Gravelle, Hugh & Masiero, Giuliano, 2000. "Quality incentives in a regulated market with imperfect information and switching costs: capitation in general practice," Journal of Health Economics, Elsevier, vol. 19(6), pages 1067-1088, November.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:124144. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.