IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006116.html
   My bibliography  Save this article

Compositional clustering in task structure learning

Author

Listed:
  • Nicholas T Franklin
  • Michael J Frank

Abstract

Humans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models can only re-use policies as a whole and are unable to transfer knowledge about the transition structure of the environment even if only the goal has changed (or vice-versa). This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non-parametric Bayesian agent that forms independent latent clusters for transition and reward functions, affording separable transfer of their constituent parts across contexts. We show that the relative performance of this agent compared to an agent that jointly clusters reward and transition functions depends environmental task statistics: the mutual information between transition and reward functions and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and propose a meta learning agent that dynamically arbitrates between strategies across task domains to optimize a statistical tradeoff.Author summary: A musician may learn to generalize behaviors across instruments for different purposes, for example, reusing hand motions used when playing classical on the flute to play jazz on the saxophone. Conversely, she may learn to play a single song across many instruments that require completely distinct physical motions, but nonetheless transfer knowledge between them. This degree of compositionality is often absent from computational frameworks of learning, forcing agents either to generalize entire learned policies or to learn new policies from scratch. Here, we propose a solution to this problem that allows an agent to generalize components of a policy independently and compare it to an agent that generalizes components as a whole. We show that the degree to which one form of generalization is favored over the other is dependent on the features of task domain, with independent generalization of task components favored in environments with weak relationships between components or high degrees of noise and joint generalization of task components favored when there is a clear, discoverable relationship between task components. Furthermore, we show that the overall meta structure of the environment can be learned and leveraged by an agent that dynamically arbitrates between these forms of structure learning.

Suggested Citation

  • Nicholas T Franklin & Michael J Frank, 2018. "Compositional clustering in task structure learning," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-25, April.
  • Handle: RePEc:plo:pcbi00:1006116
    DOI: 10.1371/journal.pcbi.1006116
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006116
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006116&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006116?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. M Berk Mirza & Rick A Adams & Christoph Mathys & Karl J Friston, 2018. "Human visual exploration reduces uncertainty about the sensed world," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-20, January.
    2. I. Momennejad & E. M. Russek & J. H. Cheong & M. M. Botvinick & N. D. Daw & S. J. Gershman, 2017. "The successor representation in human reinforcement learning," Nature Human Behaviour, Nature, vol. 1(9), pages 680-692, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
    2. Nicholas Menghi & Kemal Kacar & Will Penny, 2021. "Multitask learning over shared subspaces," PLOS Computational Biology, Public Library of Science, vol. 17(7), pages 1-25, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Johann Lussange & Stefano Vrizzi & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2023. "Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model," Computational Economics, Springer;Society for Computational Economics, vol. 61(4), pages 1523-1544, April.
    2. Johann Lussange & Ivan Lazarevich & Sacha Bourgeois-Gironde & Stefano Palminteri & Boris Gutkin, 2021. "Modelling Stock Markets by Multi-agent Reinforcement Learning," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 113-147, January.
    3. Julie J Lee & Mehdi Keramati, 2017. "Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-15, September.
    4. Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
    5. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    6. Momchil S Tomov & Samyukta Yagati & Agni Kumar & Wanqian Yang & Samuel J Gershman, 2020. "Discovery of hierarchical representations for efficient planning," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-42, April.
    7. Changbo Zhu & Ke Zhou & Fengzhen Tang & Yandong Tang & Xiaoli Li & Bailu Si, 2022. "A Hierarchical Bayesian Model for Inferring and Decision Making in Multi-Dimensional Volatile Binary Environments," Mathematics, MDPI, vol. 10(24), pages 1-35, December.
    8. Liu, Hui & Yu, Chengqing & Wu, Haiping & Duan, Zhu & Yan, Guangxi, 2020. "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting," Energy, Elsevier, vol. 202(C).
    9. Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
    10. Ruohan Zhang & Shun Zhang & Matthew H Tong & Yuchen Cui & Constantin A Rothkopf & Dana H Ballard & Mary M Hayhoe, 2018. "Modeling sensory-motor decisions in natural behavior," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-22, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006116. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.