Reward-predictive representations generalize across tasks in reinforcement learning

My bibliography Save this article

Reward-predictive representations generalize across tasks in reinforcement learning

Author

Listed:

Lucas Lehnert
Michael L Littman
Michael J Frank

Registered:

Abstract

In computer science, reinforcement learning is a powerful framework with which artificial agents can learn to maximize their performance for any given Markov decision process (MDP). Advances over the last decade, in combination with deep neural networks, have enjoyed performance advantages over humans in many difficult task settings. However, such frameworks perform far less favorably when evaluated in their ability to generalize or transfer representations across different tasks. Existing algorithms that facilitate transfer typically are limited to cases in which the transition function or the optimal policy is portable to new contexts, but achieving “deep transfer” characteristic of human behavior has been elusive. Such transfer typically requires discovery of abstractions that permit analogical reuse of previously learned representations to superficially distinct tasks. Here, we demonstrate that abstractions that minimize error in predictions of reward outcomes generalize across tasks with different transition and reward functions. Such reward-predictive representations compress the state space of a task into a lower dimensional representation by combining states that are equivalent in terms of both the transition and reward functions. Because only state equivalences are considered, the resulting state representation is not tied to the transition and reward functions themselves and thus generalizes across tasks with different reward and transition functions. These results contrast with those using abstractions that myopically maximize reward in any given MDP and motivate further experiments in humans and animals to investigate if neural and cognitive systems involved in state representation perform abstractions that facilitate such equivalence relations.Author summary: Humans are capable of transferring abstract knowledge from one task to another. For example, in a right-hand-drive country, a driver has to use the right arm to operate the shifter. A driver who learned how to drive in a right-hand-drive country can adapt to operating a left-hand-drive car and use the other arm for shifting instead of re-learning how to drive. Despite the fact that both tasks require different coordination of motor skills, both tasks are the same in an abstract sense: In both tasks, a car is operated and there is the same progression from 1st to 2nd gear and so on. We study distinct algorithms by which a reinforcement learning agent can discover state representations that encode knowledge about a particular task, and evaluate how well they can generalize. Through a sequence of simulation results, we show that state abstractions that minimize errors in prediction about future reward outcomes generalize across tasks, even those that superficially differ in both the goals (rewards) and the transitions from one state to the next. This work motivates biological studies to determine if distinct circuits are adapted to maximize reward vs. to discover useful state representations.

Suggested Citation

Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.

Handle: RePEc:plo:pcbi00:1008317
DOI: 10.1371/journal.pcbi.1008317

Download full text from publisher

References listed on IDEAS

Nicholas T Franklin & Michael J Frank, 2018. "Compositional clustering in task structure learning," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-25, April.
I. Momennejad & E. M. Russek & J. H. Cheong & M. M. Botvinick & N. D. Daw & S. J. Gershman, 2017. "The successor representation in human reinforcement learning," Nature Human Behaviour, Nature, vol. 1(9), pages 680-692, September.
Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
Nicky J. Welton & Howard H. Z. Thom, 2015. "Value of Information," Medical Decision Making, , vol. 35(5), pages 564-566, July.
Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Amirhosein Mosavi & Yaser Faghan & Pedram Ghamisi & Puhong Duan & Sina Faizollahzadeh Ardabili & Ely Salwana & Shahab S. Band, 2020. "Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics," Mathematics, MDPI, vol. 8(10), pages 1-42, September.
Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
Liu, Hui & Yu, Chengqing & Wu, Haiping & Duan, Zhu & Yan, Guangxi, 2020. "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting," Energy, Elsevier, vol. 202(C).
Ruohan Zhang & Shun Zhang & Matthew H Tong & Yuchen Cui & Constantin A Rothkopf & Dana H Ballard & Mary M Hayhoe, 2018. "Modeling sensory-motor decisions in natural behavior," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-22, October.
Momchil S Tomov & Samyukta Yagati & Agni Kumar & Wanqian Yang & Samuel J Gershman, 2020. "Discovery of hierarchical representations for efficient planning," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-42, April.
Lee, Alice J. & Ames, Daniel R., 2017. "“I can’t pay more” versus “It’s not worth more”: Divergent effects of constraint and disparagement rationales in negotiations," Organizational Behavior and Human Decision Processes, Elsevier, vol. 141(C), pages 16-28.
Hussain, Hadia & Murtaza, Murtaza & Ajmal, Areeb & Ahmed, Afreen & Khan, Muhammad Ovais Khalid, 2020. "A study on the effects of social media advertisement on consumer’s attitude and customer response," MPRA Paper 104675, University Library of Munich, Germany.
A. G. Fatullayev & Nizami A. Gasilov & Şahin Emrah Amrahov, 2019. "Numerical solution of linear inhomogeneous fuzzy delay differential equations," Fuzzy Optimization and Decision Making, Springer, vol. 18(3), pages 315-326, September.
Arun Advani & William Elming & Jonathan Shaw, 2023. "The Dynamic Effects of Tax Audits," The Review of Economics and Statistics, MIT Press, vol. 105(3), pages 545-561, May.
- Arun Advani & William Elming & Jonathan Shaw, 2017. "The dynamic effects of tax audits," IFS Working Papers W17/24, Institute for Fiscal Studies.
- Advani, Arun & Elming, William & Shaw, Jonathan, 2019. "The Dynamic Effects of Tax Audits," CAGE Online Working Paper Series 414, Competitive Advantage in the Global Economy (CAGE).
- Advani, Arun & Elming, William & Shaw, Jonathan, 2019. "The Dynamic Effects of Tax Audits," The Warwick Economics Research Paper Series (TWERPS) 1198, University of Warwick, Department of Economics.
Aghion, Philippe & Akcigit, Ufuk & Lequien, Matthieu & Stantcheva, Stefanie, 2017. "Tax simplicity and heterogeneous learning," LSE Research Online Documents on Economics 86613, London School of Economics and Political Science, LSE Library.
- P. Aghion & U. Akcigit & M. Lequien & S. Stantcheva, 2018. "Tax Simplicity and Heterogeneous Learning," Working papers 665, Banque de France.
- Stantcheva, Stefanie & Aghion, Philippe & Lequien, Matthieu & Akcigit, Ufuk, 2017. "Tax Simplicity and Heterogeneous Learning," CEPR Discussion Papers 12471, C.E.P.R. Discussion Papers.
- Philippe Aghion & Ufuk Akcigit & Matthieu Lequien & Stefanie Stantcheva, 2017. "Tax simplicity and heterogeneous learning," CEP Discussion Papers dp1516, Centre for Economic Performance, LSE.
Tulika Saha & Sriparna Saha & Pushpak Bhattacharyya, 2020. "Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-28, July.
Marie Bjørneby & Annette Alstadsæter & Kjetil Telle, 2018. "Collusive tax evasion by employers and employees. Evidence from a randomized fi eld experiment in Norway," Discussion Papers 891, Statistics Norway, Research Department.
- Marie Bjørneby & Annette Alstadsæter & Kjetil Telle, 2018. "Collusive Tax Evasion by Employers and Employees: Evidence from a Randomized Field Experiment in Norway," CESifo Working Paper Series 7381, CESifo.
Chuangen Gao & Shuyang Gu & Jiguo Yu & Hai Du & Weili Wu, 2022. "Adaptive seeding for profit maximization in social networks," Journal of Global Optimization, Springer, vol. 82(2), pages 413-432, February.
Koessler, Frederic & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," Theoretical Economics, Econometric Society, vol. 17(2), May.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2021. "Long Information Design," PSE Working Papers halshs-02400053, HAL.
- Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," PSE-Ecole d'économie de Paris (Postprint) hal-03700394, HAL.
- Koessler, Frédéric & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," TSE Working Papers 22-1341, Toulouse School of Economics (TSE).
- Marie Laclau & Frédéric Koessler & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," Post-Print halshs-03342880, HAL.
- Marie Laclau & Frédéric Koessler & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," PSE-Ecole d'économie de Paris (Postprint) halshs-03342880, HAL.
- Frédéric Koessler & Marie Laclau & Jerôme Renault & Tristan Tomala, 2022. "Long information design," Post-Print hal-03700394, HAL.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2021. "Long Information Design," Working Papers halshs-02400053, HAL.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," Post-Print halshs-02400053, HAL.
- Frédéric Koessler & Marie Laclau & Jérôme Renault & Tristan Tomala, 2022. "Long Information Design," PSE-Ecole d'économie de Paris (Postprint) halshs-02400053, HAL.
Michelle Dietzen & Haoran Zhai & Olivia Lucas & Oriol Pich & Christopher Barrington & Wei-Ting Lu & Sophia Ward & Yanping Guo & Robert E. Hynds & Simone Zaccaria & Charles Swanton & Nicholas McGranaha, 2024. "Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
Annette Alstadsæter & Wojciech Kopczuk & Kjetil Telle, 2019. "Social networks and tax avoidance: evidence from a well-defined Norwegian tax shelter," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 26(6), pages 1291-1328, December.
- Annette Alstadsæter & Wojciech Kopczuk & Kjetil Telle, 2018. "Social Networks and Tax Avoidance: Evidence from a Well-Defined Norwegian Tax Shelter," NBER Working Papers 25191, National Bureau of Economic Research, Inc.
- Annette Alstadsæter & Wojciech Kopczuk & Kjetil Telle, 2018. "Social networks and tax avoidance. Evidence from a well-defined Norwegian tax shelter," Discussion Papers 886, Statistics Norway, Research Department.
- Kopczuk, Wojciech & AlstadsÃ¦ter, Annette & Telle, Kjetil, 2018. "Social networks and tax avoidance: Evidence from a well-defined Norwegian tax shelter," CEPR Discussion Papers 13251, C.E.P.R. Discussion Papers.
Mahmoud Mahfouz & Angelos Filos & Cyrine Chtourou & Joshua Lockhart & Samuel Assefa & Manuela Veloso & Danilo Mandic & Tucker Balch, 2019. "On the Importance of Opponent Modeling in Auction Markets," Papers 1911.12816, arXiv.org.
Lixiang Zhang & Yan Yan & Yaoguang Hu, 2024. "Deep reinforcement learning for dynamic scheduling of energy-efficient automated guided vehicles," Journal of Intelligent Manufacturing, Springer, vol. 35(8), pages 3875-3888, December.
Sebastian Kaumanns, 2019. "“Some fuzzy math”: relational information on debt value adjustments by managers and the financial press," Business Research, Springer;German Academic Association for Business Research, vol. 12(2), pages 755-794, December.
Samuel J Gershman, 2015. "A Unifying Probabilistic View of Associative Learning," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-20, November.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008317. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Reward-predictive representations generalize across tasks in reinforcement learning

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data