IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1001003.html
   My bibliography  Save this article

Structure Learning in Human Sequential Decision-Making

Author

Listed:
  • Daniel E Acuña
  • Paul Schrater

Abstract

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.Author Summary: Every decision-making experiment has a structure that specifies how rewards are obtained, which is usually explained to the subject at the beginning of the experiment. Participants frequently fail to act as if they understand the experimental structure, even in tasks as simple as determining which of two biased coins they should choose to maximize the number of trials that produce “heads”. We hypothesize that participants' behavior is not driven by top-down instructions—rather, participants must learn through experience how the rewards are generated. We formalize this hypothesis using a fully rational optimal Bayesian reinforcement learning approach that models optimal structure learning in sequential decision making. In an experimental test of structure learning in humans, we show that humans learn reward structure from experience in a near optimal manner. Our results demonstrate that behavior purported to show that humans are error-prone and suboptimal decision makers can result from an optimal learning approach. Our findings provide a compelling new family of rational hypotheses for behavior previously deemed irrational, including under- and over-exploration.

Suggested Citation

  • Daniel E Acuña & Paul Schrater, 2010. "Structure Learning in Human Sequential Decision-Making," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-12, December.
  • Handle: RePEc:plo:pcbi00:1001003
    DOI: 10.1371/journal.pcbi.1001003
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1001003
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1001003&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1001003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
    2. Robert J. Meyer & Yong Shi, 1995. "Sequential Choice Under Ambiguity: Intuitive Solutions to the Armed-Bandit Problem," Management Science, INFORMS, vol. 41(5), pages 817-834, May.
    3. Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
    4. Jeffrey Banks & David Porter & Mark Olson, 1997. "An experimental analysis of the bandit problem," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 10(1), pages 55-77.
    5. Nathaniel D. Daw & John P. O'Doherty & Peter Dayan & Ben Seymour & Raymond J. Dolan, 2006. "Cortical substrates for exploratory decisions in humans," Nature, Nature, vol. 441(7095), pages 876-879, June.
    6. Yutaka Sakai & Tomoki Fukai, 2008. "When Does Reward Maximization Lead to Matching Law?," PLOS ONE, Public Library of Science, vol. 3(11), pages 1-7, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Janet M. Currie & W. Bentley MacLeod, 2018. "Understanding Doctor Decision Making: The Case of Depression," NBER Working Papers 24955, National Bureau of Economic Research, Inc.
    2. Francesco Rigoli & Christoph Mathys & Karl J Friston & Raymond J Dolan, 2017. "A unifying Bayesian account of contextual effects in value-based choice," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-28, October.
    3. Elyse H Norton & Stephen M Fleming & Nathaniel D Daw & Michael S Landy, 2017. "Suboptimal Criterion Learning in Static and Dynamic Environments," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-28, January.
    4. Amir Dezfouli & Kristi Griffiths & Fabio Ramos & Peter Dayan & Bernard W Balleine, 2019. "Models that learn how humans learn: The case of decision-making and its disorders," PLOS Computational Biology, Public Library of Science, vol. 15(6), pages 1-33, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
    2. Hu, Yingyao & Kayaba, Yutaka & Shum, Matthew, 2013. "Nonparametric learning rules from bandit experiments: The eyes have it!," Games and Economic Behavior, Elsevier, vol. 81(C), pages 215-231.
    3. Yilmaz Kocer, 2010. "Endogenous Learning with Bounded Memory," Working Papers 1290, Princeton University, Department of Economics, Econometric Research Program..
    4. Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
    5. Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2017. "Meaningful learning in weighted voting games: an experiment," Theory and Decision, Springer, vol. 83(1), pages 131-153, June.
    6. Gars, Jared & Ward, Patrick S., 2019. "Can differences in individual learning explain patterns of technology adoption? Evidence on heterogeneous learning patterns and hybrid rice adoption in Bihar, India," World Development, Elsevier, vol. 115(C), pages 178-189.
    7. repec:cup:judgdm:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
    8. repec:jdm:journl:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
    9. Johannes Hoelzemann & Nicolas Klein, 2021. "Bandits in the lab," Quantitative Economics, Econometric Society, vol. 12(3), pages 1021-1051, July.
    10. Gars, Jared & Ward, Patrick S., 2016. "The role of learning in technology adoption: Evidence on hybrid rice adoption in Bihar, India," IFPRI discussion papers 1591, International Food Policy Research Institute (IFPRI).
    11. repec:cup:judgdm:v:12:y:2017:i:2:p:104-117 is not listed on IDEAS
    12. Hudja, Stanton, 2021. "Is Experimentation Invariant to Group Size? A Laboratory Analysis of Innovation Contests," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 91(C).
    13. Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2015. "Meaningful Learning in Weighted Voting Games: An Experiment," Working Papers halshs-01216244, HAL.
    14. Christopher Anderson, 2012. "Ambiguity aversion in multi-armed bandit problems," Theory and Decision, Springer, vol. 72(1), pages 15-33, January.
    15. Andrew M. Davis & Vishal Gaur & Dayoung Kim, 2021. "Consumer Learning from Own Experience and Social Information: An Experimental Study," Management Science, INFORMS, vol. 67(5), pages 2924-2943, May.
    16. Paul M. Krueger & Robert C. Wilson & Jonathan D. Cohen, 2017. "Strategies for exploration in the domain of losses," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 12(2), pages 104-117, March.
    17. Nobuyuki Hanaki & Alan P. Kirman & Paul Pezanis-Christou, 2016. "Counter Intuitive Learning: An Exploratory Study," CESifo Working Paper Series 6029, CESifo.
    18. Christina Fang & Daniel Levinthal, 2009. "Near-Term Liability of Exploitation: Exploration and Exploitation in Multistage Problems," Organization Science, INFORMS, vol. 20(3), pages 538-551, June.
    19. Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
    20. Jean Paul Rabanal & Aleksei Chernulich & John Horowitz & Olga A. Rud & Manizha Sharifova, 2019. "Market timing under public and private information," Working Papers 151, Peruvian Economic Association.
    21. Naoki Watanabe, 2022. "Reconsidering Meaningful Learning in a Bandit Experiment on Weighted Voting: Subjects’ Search Behavior," The Review of Socionetwork Strategies, Springer, vol. 16(1), pages 81-107, April.
    22. Marcoul, Philippe & Weninger, Quinn, 2008. "Search and active learning with correlated information: Empirical evidence from mid-Atlantic clam fishermen," Journal of Economic Dynamics and Control, Elsevier, vol. 32(6), pages 1921-1948, June.
    23. Maime Guan & Ryan Stokes & Joachim Vandekerckhove & Michael D. Lee, 2020. "A cognitive modeling analysis of risk in sequential choice tasks}," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 15(5), pages 823-850, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1001003. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.