IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005810.html
   My bibliography  Save this article

Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

Author

Listed:
  • Jaron T Colas
  • Wolfgang M Pauli
  • Tobias Larsen
  • J Michael Tyszka
  • John P O’Doherty

Abstract

Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.Author summary: An accumulating body of evidence suggests that signals of a reward-prediction error encoded by dopaminergic neurons in the midbrain comprise a fundamental mechanism underpinning reward learning, including learning of instrumental actions. Nevertheless, a major open question concerns the specific computational details of the “reinforcement-learning” algorithms through which these prediction-error signals are generated. Here, we designed a novel task specifically to address this issue. A fundamental distinction is drawn between predictions based on the values of states and predictions based on the values of actions. We found evidence in the human brain that different prediction-error signals involved in learning about the values of either states or actions are represented in the substantia nigra and the striatum. These findings are consistent with an “actor/critic” (i.e., state-value-learning) architecture updating in parallel with a more direct action-value-learning system, providing important constraints on the actual form of the reinforcement-learning computations that are implemented in the mesostriatal dopamine system in humans.

Suggested Citation

  • Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
  • Handle: RePEc:plo:pcbi00:1005810
    DOI: 10.1371/journal.pcbi.1005810
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005810
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005810&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005810?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas Akam & Rui Costa & Peter Dayan, 2015. "Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-25, December.
    2. Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
    3. I. Momennejad & E. M. Russek & J. H. Cheong & M. M. Botvinick & N. D. Daw & S. J. Gershman, 2017. "The successor representation in human reinforcement learning," Nature Human Behaviour, Nature, vol. 1(9), pages 680-692, September.
    4. Choong-Wan Woo & Mathieu Roy & Jason T Buhle & Tor D Wager, 2015. "Distinct Brain Systems Mediate the Effects of Nociceptive Input and Self-Regulation on Pain," PLOS Biology, Public Library of Science, vol. 13(1), pages 1-14, January.
    5. Jeanette A Mumford & Jean-Baptiste Poline & Russell A Poldrack, 2015. "Orthogonalization of Regressors in fMRI Models," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-11, April.
    6. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    7. Roger Shepard, 1957. "Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space," Psychometrika, Springer;The Psychometric Society, vol. 22(4), pages 325-345, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julie J Lee & Mehdi Keramati, 2017. "Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-15, September.
    2. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    3. Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
    4. Roger Shepard, 1974. "Representation of structure in similarity data: Problems and prospects," Psychometrika, Springer;The Psychometric Society, vol. 39(4), pages 373-421, December.
    5. Lee Cooper & Masao Nakanishi, 1983. "Two logit models for external analysis of preferences," Psychometrika, Springer;The Psychometric Society, vol. 48(4), pages 607-620, December.
    6. Despoina Alempaki & Emina Canic & Timothy L. Mullett & William J. Skylark & Chris Starmer & Neil Stewart & Fabio Tufano, 2019. "Reexamining How Utility and Weighting Functions Get Their Shapes: A Quasi-Adversarial Collaboration Providing a New Interpretation," Management Science, INFORMS, vol. 65(10), pages 4841-4862, October.
    7. Laurens Winkelmeier & Carla Filosa & Renée Hartig & Max Scheller & Markus Sack & Jonathan R. Reinwald & Robert Becker & David Wolf & Martin Fungisai Gerchen & Alexander Sartorius & Andreas Meyer-Linde, 2022. "Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    8. Zhewei Zhang & Yuji K. Takahashi & Marlian Montesinos-Cartegena & Thorsten Kahnt & Angela J. Langdon & Geoffrey Schoenbaum, 2024. "Expectancy-related changes in firing of dopamine neurons depend on hippocampus," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    9. Amir Dezfouli & Bernard W Balleine, 2019. "Learning the structure of the world: The adaptive nature of state-space and action representations in multi-stage decision-making," PLOS Computational Biology, Public Library of Science, vol. 15(9), pages 1-22, September.
    10. Joseph Zinnes & Richard Griggs, 1974. "Probabilistic, multidimensional unfolding analysis," Psychometrika, Springer;The Psychometric Society, vol. 39(3), pages 327-350, September.
    11. Roger Shepard, 1961. "Application of a trace model to the retention of information in a recognition task," Psychometrika, Springer;The Psychometric Society, vol. 26(2), pages 185-203, June.
    12. Roger Shepard, 1962. "The analysis of proximities: Multidimensional scaling with an unknown distance function. II," Psychometrika, Springer;The Psychometric Society, vol. 27(3), pages 219-246, September.
    13. Etienne Vachon-Presseau & Sara E Berger & Taha B Abdullah & James W Griffith & Thomas J Schnitzer & A Vania Apkarian, 2019. "Identification of traits and functional connectivity-based neurotraits of chronic pain," PLOS Biology, Public Library of Science, vol. 17(8), pages 1-24, August.
    14. Liu, Hui & Yu, Chengqing & Wu, Haiping & Duan, Zhu & Yan, Guangxi, 2020. "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting," Energy, Elsevier, vol. 202(C).
    15. Johannes Algermissen & Jennifer C. Swart & René Scheeringa & Roshan Cools & Hanneke E. M. den Ouden, 2024. "Prefrontal signals precede striatal signals for biased credit assignment in motivational learning biases," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    16. Vincent Moens & Alexandre Zénon, 2019. "Learning and forgetting using reinforced Bayesian change detection," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-41, April.
    17. Bruno Miranda & W M Nishantha Malalasekera & Timothy E Behrens & Peter Dayan & Steven W Kennerley, 2020. "Combined model-free and model-sensitive reinforcement learning in non-human primates," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-25, June.
    18. Oliver Contier & Chris I. Baker & Martin N. Hebart, 2024. "Distributed representations of behaviour-derived object dimensions in the human visual system," Nature Human Behaviour, Nature, vol. 8(11), pages 2179-2193, November.
    19. Kathleen Wiencke & Annette Horstmann & David Mathar & Arno Villringer & Jane Neumann, 2020. "Dopamine release, diffusion and uptake: A computational model for synaptic and volume transmission," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-26, November.
    20. Ruohan Zhang & Shun Zhang & Matthew H Tong & Yuchen Cui & Constantin A Rothkopf & Dana H Ballard & Mary M Hayhoe, 2018. "Modeling sensory-motor decisions in natural behavior," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-22, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005810. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.