IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/124144.html
   My bibliography  Save this paper

To switch or not to switch? Balanced policy switching in offline reinforcement learning

Author

Listed:
  • Ma, Tao
  • Yang, Xuzhi
  • Szabo, Zoltan

Abstract

Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which incurs a non-negligible cost (examples include the shifting of the currently applied educational technology, modernization of a computing cluster, and the introduction of a new webpage design), and in the decision one is limited to using historical data without the availability for further online interaction. Despite the inevitable importance of this offline learning scenario, to our best knowledge, very little effort has been made to tackle the key problem of balancing between the gain and the cost of switching in a flexible and principled way. Leveraging ideas from the area of optimal transport, we initialize the systematic study of policy switching in offline RL. We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation. Numerical experiments demonstrate the efficiency of our approach on multiple benchmarks of the Gymnasium.

Suggested Citation

  • Ma, Tao & Yang, Xuzhi & Szabo, Zoltan, 2024. "To switch or not to switch? Balanced policy switching in offline reinforcement learning," LSE Research Online Documents on Economics 124144, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:124144
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/124144/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tore Nilssen, 1992. "Two Kinds of Consumer Switching Costs," RAND Journal of Economics, The RAND Corporation, vol. 23(4), pages 579-589, Winter.
    2. Lynn M. LoPucki & Joseph W. Doherty, 2004. "The Determinants of Professional Fees in Large Bankruptcy Reorganization Cases," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 1(1), pages 111-141, March.
    3. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Annabi, Amira & Breton, Michèle & François, Pascal, 2012. "Resolution of financial distress under Chapter 11," Journal of Economic Dynamics and Control, Elsevier, vol. 36(12), pages 1867-1887.
    2. Lam, W., 2015. "Switching Costs in Two-sided Markets," LIDAM Discussion Papers CORE 2015024, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    3. Yuchen Zhang & Wei Yang, 2022. "Breakthrough invention and problem complexity: Evidence from a quasi‐experiment," Strategic Management Journal, Wiley Blackwell, vol. 43(12), pages 2510-2544, December.
    4. Arturo Bris & Alan Schwartz & Ivo Welch, 2005. "Who Should Pay for Bankruptcy Costs?," The Journal of Legal Studies, University of Chicago Press, vol. 34(2), pages 295-341, June.
    5. Daníelsson, Jón & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," Journal of Banking & Finance, Elsevier, vol. 140(C).
    6. Stole, Lars A., 2007. "Price Discrimination and Competition," Handbook of Industrial Organization, in: Mark Armstrong & Robert Porter (ed.), Handbook of Industrial Organization, edition 1, volume 3, chapter 34, pages 2221-2299, Elsevier.
    7. Thamayanthi Chellathurai, 2017. "Probability Density Of Recovery Rate Given Default Of A Firm’S Debt And Its Constituent Tranches," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 20(04), pages 1-34, June.
    8. Asplund, Marcus & Eriksson, Rickard & Strand, Niklas, 2001. "Price Discrimination in Oligopoly: Evidence from Swedish Newspapers," SSE/EFI Working Paper Series in Economics and Finance 468, Stockholm School of Economics, revised 01 Jan 2007.
    9. Omar Al-Ani & Sanjoy Das, 2022. "Reinforcement Learning: Theory and Applications in HEMS," Energies, MDPI, vol. 15(17), pages 1-37, September.
    10. Ostheimer, Julia & Chowdhury, Soumitra & Iqbal, Sarfraz, 2021. "An alliance of humans and machines for machine learning: Hybrid intelligent systems and their design principles," Technology in Society, Elsevier, vol. 66(C).
    11. Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
    12. Stefano Colombo, 2018. "Behavior‐ and characteristic‐based price discrimination," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(2), pages 237-250, June.
    13. Zhou, Yuhao & Wang, Yanwei, 2022. "An integrated framework based on deep learning algorithm for optimizing thermochemical production in heavy oil reservoirs," Energy, Elsevier, vol. 253(C).
    14. Mandal, Ankit & Tiwari, Yash & Panigrahi, Prasanta K. & Pal, Mayukha, 2022. "Physics aware analytics for accurate state prediction of dynamical systems," Chaos, Solitons & Fractals, Elsevier, vol. 164(C).
    15. Wing Man Wynne Lam, 2017. "Switching Costs in Two-Sided Markets," Journal of Industrial Economics, Wiley Blackwell, vol. 65(1), pages 136-182, March.
    16. Gans, Joshua S., 2000. "Network competition and consumer churn," Information Economics and Policy, Elsevier, vol. 12(2), pages 97-109, June.
    17. Bossert, Leonie & Hagendorff, Thilo, 2021. "Animals and AI. The role of animals in AI research and application – An overview and ethical evaluation," Technology in Society, Elsevier, vol. 67(C).
    18. Yang, Zhengzhi & Zheng, Lei & Perc, Matjaž & Li, Yumeng, 2024. "Interaction state Q-learning promotes cooperation in the spatial prisoner's dilemma game," Applied Mathematics and Computation, Elsevier, vol. 463(C).
    19. Rohan Pitchford & Mark L. J. Wright, 2012. "Holdouts in Sovereign Debt Restructuring: A Theory of Negotiation in a Weak Contractual Environment," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 79(2), pages 812-837.
    20. Zhang, Yihao & Chai, Zhaojie & Lykotrafitis, George, 2021. "Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 571(C).

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:124144. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.