IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006713.html
   My bibliography  Save this article

Learning and forgetting using reinforced Bayesian change detection

Author

Listed:
  • Vincent Moens
  • Alexandre Zénon

Abstract

Agents living in volatile environments must be able to detect changes in contingencies while refraining to adapt to unexpected events that are caused by noise. In Reinforcement Learning (RL) frameworks, this requires learning rates that adapt to past reliability of the model. The observation that behavioural flexibility in animals tends to decrease following prolonged training in stable environment provides experimental evidence for such adaptive learning rates. However, in classical RL models, learning rate is either fixed or scheduled and can thus not adapt dynamically to environmental changes. Here, we propose a new Bayesian learning model, using variational inference, that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs. The weight given to these two sources is optimized alongside the other parameters, allowing the model to adapt dynamically to changes in environmental volatility and to unexpected observations. This approach is used to implement the “critic” of an actor-critic RL model, while the actor samples the resulting value distributions to choose which action to undertake. We show that our model can emulate different adaptation strategies to contingency changes, depending on its prior assumptions of environmental stability, and that model parameters can be fit to real data with high accuracy. The model also exhibits trade-offs between flexibility and computational costs that mirror those observed in real data. Overall, the proposed method provides a general framework to study learning flexibility and decision making in RL contexts.Author summary: In stable contexts, animals and humans exhibit automatic behaviour that allows them to make fast decisions. However, these automatic processes exhibit a lack of flexibility when environmental contingencies change. In the present paper, we propose a model of behavioural automatization that is based on adaptive forgetting and that emulates these properties. The model builds an estimate of the stability of the environment and uses this estimate to adjust its learning rate and the balance between exploration and exploitation policies. The model performs Bayesian inference on latent variables that represent relevant environmental properties, such as reward functions, optimal policies or environment stability. From there, the model makes decisions in order to maximize long-term rewards, with a noise proportional to environmental uncertainty. This rich model encompasses many aspects of Reinforcement Learning (RL), such as Temporal Difference RL and counterfactual learning, and accounts for the reduced computational cost of automatic behaviour. Using simulations, we show that this model leads to interesting predictions about the efficiency with which subjects adapt to sudden change of contingencies after prolonged training.

Suggested Citation

  • Vincent Moens & Alexandre Zénon, 2019. "Learning and forgetting using reinforced Bayesian change detection," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-41, April.
  • Handle: RePEc:plo:pcbi00:1006713
    DOI: 10.1371/journal.pcbi.1006713
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006713
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006713&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006713?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Aaron M. Bornstein & Mel W. Khaw & Daphna Shohamy & Nathaniel D. Daw, 2017. "Reminders of past choices bias decisions for reward in humans," Nature Communications, Nature, vol. 8(1), pages 1-9, December.
    2. Matthew R. Nassar & Rasmus Bruckner & Joshua I. Gold & Shu-Chen Li & Hauke R. Heekeren & Ben Eppinger, 2016. "Age differences in learning emerge from an insufficient representation of uncertainty in older adults," Nature Communications, Nature, vol. 7(1), pages 1-13, September.
    3. Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Payam Piray & Nathaniel D. Daw, 2021. "A model for learning based on the joint estimation of stochasticity and volatility," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    2. He A Xu & Alireza Modirshanechi & Marco P Lehmann & Wulfram Gerstner & Michael H Herzog, 2021. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-32, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:cup:judgdm:v:16:y:2021:i:1:p:201-237 is not listed on IDEAS
    2. Mel W Khaw & Luminita Stevens & Michael Woodford, 2021. "Individual differences in the perception of probability," PLOS Computational Biology, Public Library of Science, vol. 17(4), pages 1-25, April.
    3. Ronayne, David & Brown, Gordon D.A., 2016. "Multi-attribute decision by sampling: An account of the attraction, comprimise and similarity effects," The Warwick Economics Research Paper Series (TWERPS) 1124, University of Warwick, Department of Economics.
    4. S. Cerreia-Vioglio & F. Maccheroni & M. Marinacci & A. Rustichini, 2017. "Multinomial logit processes and preference discovery: inside and outside the black box," Working Papers 615, IGIER (Innocenzo Gasparini Institute for Economic Research), Bocconi University.
    5. Rosen Valchev & Cosmin Ilut, 2017. "Economic Agents as Imperfect Problem Solvers," 2017 Meeting Papers 1285, Society for Economic Dynamics.
    6. Simone Cerreia-Vioglio & Fabio Maccheroni & Massimo Marinacci, 2020. "Multinomial logit processes and preference discovery: outside and inside the black box," Working Papers 663, IGIER (Innocenzo Gasparini Institute for Economic Research), Bocconi University.
    7. Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
    8. Marieke Jepma & Jessica V Schaaf & Ingmar Visser & Hilde M Huizenga, 2020. "Uncertainty-driven regulation of learning and exploration in adolescents: A computational account," PLOS Computational Biology, Public Library of Science, vol. 16(9), pages 1-29, September.
    9. repec:cup:judgdm:v:16:y:2021:i:1:p:114-130 is not listed on IDEAS
    10. Mikhail S. Spektor & Dirk U. Wulff, 2021. "Myopia drives reckless behavior in response to over-taxation," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 16(1), pages 114-130, January.
    11. Jingwei Sun & Jian Li & Hang Zhang, 2019. "Human representation of multimodal distributions as clusters of samples," PLOS Computational Biology, Public Library of Science, vol. 15(5), pages 1-29, May.
    12. Gloria W. Feng & Robb B. Rutledge, 2024. "Surprising sounds influence risky decision making," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    13. Payam Piray & Nathaniel D. Daw, 2021. "A model for learning based on the joint estimation of stochasticity and volatility," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    14. Zhang Chen & Rob W. Holland & Julian Quandt & Ap Dijksterhuis & Harm Veling, 2021. "How preference change induced by mere action versus inaction persists over time," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 16(1), pages 201-237, January.
    15. Kathleen Wiencke & Annette Horstmann & David Mathar & Arno Villringer & Jane Neumann, 2020. "Dopamine release, diffusion and uptake: A computational model for synaptic and volume transmission," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-26, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006713. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.