IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005810.html
   My bibliography  Save this article

Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

Author

Listed:
  • Jaron T Colas
  • Wolfgang M Pauli
  • Tobias Larsen
  • J Michael Tyszka
  • John P O’Doherty

Abstract

Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.Author summary: An accumulating body of evidence suggests that signals of a reward-prediction error encoded by dopaminergic neurons in the midbrain comprise a fundamental mechanism underpinning reward learning, including learning of instrumental actions. Nevertheless, a major open question concerns the specific computational details of the “reinforcement-learning” algorithms through which these prediction-error signals are generated. Here, we designed a novel task specifically to address this issue. A fundamental distinction is drawn between predictions based on the values of states and predictions based on the values of actions. We found evidence in the human brain that different prediction-error signals involved in learning about the values of either states or actions are represented in the substantia nigra and the striatum. These findings are consistent with an “actor/critic” (i.e., state-value-learning) architecture updating in parallel with a more direct action-value-learning system, providing important constraints on the actual form of the reinforcement-learning computations that are implemented in the mesostriatal dopamine system in humans.

Suggested Citation

  • Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
  • Handle: RePEc:plo:pcbi00:1005810
    DOI: 10.1371/journal.pcbi.1005810
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005810
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005810&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005810?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas Akam & Rui Costa & Peter Dayan, 2015. "Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-25, December.
    2. Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
    3. I. Momennejad & E. M. Russek & J. H. Cheong & M. M. Botvinick & N. D. Daw & S. J. Gershman, 2017. "The successor representation in human reinforcement learning," Nature Human Behaviour, Nature, vol. 1(9), pages 680-692, September.
    4. Choong-Wan Woo & Mathieu Roy & Jason T Buhle & Tor D Wager, 2015. "Distinct Brain Systems Mediate the Effects of Nociceptive Input and Self-Regulation on Pain," PLOS Biology, Public Library of Science, vol. 13(1), pages 1-14, January.
    5. Jeanette A Mumford & Jean-Baptiste Poline & Russell A Poldrack, 2015. "Orthogonalization of Regressors in fMRI Models," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-11, April.
    6. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    7. Roger Shepard, 1957. "Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space," Psychometrika, Springer;The Psychometric Society, vol. 22(4), pages 325-345, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julie J Lee & Mehdi Keramati, 2017. "Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-15, September.
    2. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    3. Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
    4. Roger Shepard, 1974. "Representation of structure in similarity data: Problems and prospects," Psychometrika, Springer;The Psychometric Society, vol. 39(4), pages 373-421, December.
    5. Wouter Kool & Fiery A Cushman & Samuel J Gershman, 2016. "When Does Model-Based Control Pay Off?," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-34, August.
    6. Johann Lussange & Stefano Vrizzi & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2023. "Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model," Computational Economics, Springer;Society for Computational Economics, vol. 61(4), pages 1523-1544, April.
    7. Michael Brusco & Stephanie Stahl, 2001. "Compact integer-programming models for extracting subsets of stimuli from confusion matrices," Psychometrika, Springer;The Psychometric Society, vol. 66(3), pages 405-419, September.
    8. Johann Lussange & Ivan Lazarevich & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2021. "Modelling Stock Markets by Multi-agent Reinforcement Learning," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 113-147, January.
    9. Anna Brown, 2016. "Item Response Models for Forced-Choice Questionnaires: A Common Framework," Psychometrika, Springer;The Psychometric Society, vol. 81(1), pages 135-160, March.
    10. Jaron T Colas, 2017. "Value-based decision making via sequential sampling with hierarchical competition and attentional modulation," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-40, October.
    11. Michael C. Hout & Corbin A. Cunningham & Arryn Robbins & Justin MacDonald, 2018. "Simulating the Fidelity of Data for Large Stimulus Set Sizes and Variable Dimension Estimation in Multidimensional Scaling," SAGE Open, , vol. 8(2), pages 21582440187, April.
    12. Lee Cooper & Masao Nakanishi, 1983. "Two logit models for external analysis of preferences," Psychometrika, Springer;The Psychometric Society, vol. 48(4), pages 607-620, December.
    13. He A Xu & Alireza Modirshanechi & Marco P Lehmann & Wulfram Gerstner & Michael H Herzog, 2021. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-32, June.
    14. Feng Zhou & Weihua Zhao & Ziyu Qi & Yayuan Geng & Shuxia Yao & Keith M. Kendrick & Tor D. Wager & Benjamin Becker, 2021. "A distributed fMRI-based signature for the subjective experience of fear," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    15. Despoina Alempaki & Emina Canic & Timothy L. Mullett & William J. Skylark & Chris Starmer & Neil Stewart & Fabio Tufano, 2019. "Reexamining How Utility and Weighting Functions Get Their Shapes: A Quasi-Adversarial Collaboration Providing a New Interpretation," Management Science, INFORMS, vol. 65(10), pages 4841-4862, October.
    16. Laurens Winkelmeier & Carla Filosa & Renée Hartig & Max Scheller & Markus Sack & Jonathan R. Reinwald & Robert Becker & David Wolf & Martin Fungisai Gerchen & Alexander Sartorius & Andreas Meyer-Linde, 2022. "Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    17. M. E. Hoeppli & H. Nahman-Averbuch & W. A. Hinkle & E. Leon & J. Peugh & M. Lopez-Sola & C. D. King & K. R. Goldschneider & R. C. Coghill, 2022. "Dissociation between individual differences in self-reported pain intensity and underlying fMRI brain activation," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    18. Nir Moneta & Mona M. Garvert & Hauke R. Heekeren & Nicolas W. Schuck, 2023. "Task state representations in vmPFC mediate relevant and irrelevant value signals and their behavioral influence," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    19. Amir Dezfouli & Bernard W Balleine, 2019. "Learning the structure of the world: The adaptive nature of state-space and action representations in multi-stage decision-making," PLOS Computational Biology, Public Library of Science, vol. 15(9), pages 1-22, September.
    20. Momchil S Tomov & Samyukta Yagati & Agni Kumar & Wanqian Yang & Samuel J Gershman, 2020. "Discovery of hierarchical representations for efficient planning," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-42, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005810. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.