To switch or not to switch? Balanced policy switching in offline reinforcement learning

My bibliography Save this paper

To switch or not to switch? Balanced policy switching in offline reinforcement learning

Author

Listed:

Ma, Tao
Yang, Xuzhi
Szabo, Zoltan

Registered:

Abstract

Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which incurs a non-negligible cost (examples include the shifting of the currently applied educational technology, modernization of a computing cluster, and the introduction of a new webpage design), and in the decision one is limited to using historical data without the availability for further online interaction. Despite the inevitable importance of this offline learning scenario, to our best knowledge, very little effort has been made to tackle the key problem of balancing between the gain and the cost of switching in a flexible and principled way. Leveraging ideas from the area of optimal transport, we initialize the systematic study of policy switching in offline RL. We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation. Numerical experiments demonstrate the efficiency of our approach on multiple benchmarks of the Gymnasium.

Suggested Citation

Ma, Tao & Yang, Xuzhi & Szabo, Zoltan, 2024. "To switch or not to switch? Balanced policy switching in offline reinforcement learning," LSE Research Online Documents on Economics 124144, London School of Economics and Political Science, LSE Library.

Handle: RePEc:ehl:lserod:124144

Download full text from publisher

References listed on IDEAS

Tore Nilssen, 1992. "Two Kinds of Consumer Switching Costs," RAND Journal of Economics, The RAND Corporation, vol. 23(4), pages 579-589, Winter.
- Nilssen, T., 1990. "Two Kinds of Consumer Switching Costs," Papers 12-90, Norwegian School of Economics and Business Administration-.
Lynn M. LoPucki & Joseph W. Doherty, 2004. "The Determinants of Professional Fees in Large Bankruptcy Reorganization Cases," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 1(1), pages 111-141, March.
David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Xuemeng Zhao & Weilun Huang, 2024. "Global Geopolitical Changes and New/Renewable Energy Game," Energies, MDPI, vol. 17(16), pages 1-27, August.
Annabi, Amira & Breton, Michèle & François, Pascal, 2012. "Resolution of financial distress under Chapter 11," Journal of Economic Dynamics and Control, Elsevier, vol. 36(12), pages 1867-1887.
Lam, W., 2015. "Switching Costs in Two-sided Markets," LIDAM Discussion Papers CORE 2015024, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Yuchen Zhang & Wei Yang, 2022. "Breakthrough invention and problem complexity: Evidence from a quasi‐experiment," Strategic Management Journal, Wiley Blackwell, vol. 43(12), pages 2510-2544, December.
Arturo Bris & Alan Schwartz & Ivo Welch, 2005. "Who Should Pay for Bankruptcy Costs?," The Journal of Legal Studies, University of Chicago Press, vol. 34(2), pages 295-341, June.
- Ivo Welch & Arturo Bris & Alan Schwartz, 2003. "Who Should Pay for Bankruptcy Costs?," Yale School of Management Working Papers ysm365, Yale School of Management, revised 01 Sep 2004.
- Ivo Welch & Arturo Bris & Alan Schwartz, 2003. "Who Should Pay for Bankruptcy Costs?," Yale School of Management Working Papers ysm365, Yale School of Management, revised 01 Sep 2004.
Daníelsson, Jón & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," Journal of Banking & Finance, Elsevier, vol. 140(C).
- Danielsson, Jon & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," LSE Research Online Documents on Economics 111601, London School of Economics and Political Science, LSE Library.
Stole, Lars A., 2007. "Price Discrimination and Competition," Handbook of Industrial Organization, in: Mark Armstrong & Robert Porter (ed.), Handbook of Industrial Organization, edition 1, volume 3, chapter 34, pages 2221-2299, Elsevier.
Zhang, Xi & Wang, Qin & Bi, Xiaowen & Li, Donghong & Liu, Dong & Yu, Yuanjin & Tse, Chi Kong, 2024. "Mitigating cascading failure in power grids with deep reinforcement learning-based remedial actions," Reliability Engineering and System Safety, Elsevier, vol. 250(C).
Thamayanthi Chellathurai, 2017. "Probability Density Of Recovery Rate Given Default Of A Firm’S Debt And Its Constituent Tranches," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 20(04), pages 1-34, June.
Asplund, Marcus & Eriksson, Rickard & Strand, Niklas, 2001. "Price Discrimination in Oligopoly: Evidence from Swedish Newspapers," SSE/EFI Working Paper Series in Economics and Finance 468, Stockholm School of Economics, revised 01 Jan 2007.
- Asplund, BjÃ¶rn Marcus & Eriksson, Rickard & Strand, Niklas, 2002. "Price Discrimination in Oligopoly: Evidence from Swedish Newspapers," CEPR Discussion Papers 3269, C.E.P.R. Discussion Papers.
Omar Al-Ani & Sanjoy Das, 2022. "Reinforcement Learning: Theory and Applications in HEMS," Energies, MDPI, vol. 15(17), pages 1-37, September.
Ostheimer, Julia & Chowdhury, Soumitra & Iqbal, Sarfraz, 2021. "An alliance of humans and machines for machine learning: Hybrid intelligent systems and their design principles," Technology in Society, Elsevier, vol. 66(C).
Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
Rui Wang & Ming Lyu & Jie Zhang, 2025. "A Multi-Robot Collaborative Exploration Method Based on Deep Reinforcement Learning and Knowledge Distillation," Mathematics, MDPI, vol. 13(1), pages 1-17, January.
Stefano Colombo, 2018. "Behavior‐ and characteristic‐based price discrimination," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(2), pages 237-250, June.
Zhou, Yuhao & Wang, Yanwei, 2022. "An integrated framework based on deep learning algorithm for optimizing thermochemical production in heavy oil reservoirs," Energy, Elsevier, vol. 253(C).
Mandal, Ankit & Tiwari, Yash & Panigrahi, Prasanta K. & Pal, Mayukha, 2022. "Physics aware analytics for accurate state prediction of dynamical systems," Chaos, Solitons & Fractals, Elsevier, vol. 164(C).
Adnan Jafar & Alessandra Kobayati & Michael A. Tsoukas & Ahmad Haidar, 2024. "Personalized insulin dosing using reinforcement learning for high-fat meals and aerobic exercises in type 1 diabetes: a proof-of-concept trial," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
Wing Man Wynne Lam, 2017. "Switching Costs in Two-Sided Markets," Journal of Industrial Economics, Wiley Blackwell, vol. 65(1), pages 136-182, March.
Gans, Joshua S., 2000. "Network competition and consumer churn," Information Economics and Policy, Elsevier, vol. 12(2), pages 97-109, June.

More about this item

JEL classification:

C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2024-08-26 (Big Data)
NEP-CMP-2024-08-26 (Computational Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:124144. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

To switch or not to switch? Balanced policy switching in offline reinforcement learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

JEL classification:

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data