Theory of Choice in Bandit, Information Sampling and Foraging Tasks

My bibliography Save this article

Theory of Choice in Bandit, Information Sampling and Foraging Tasks

Author

Listed:

Bruno B Averbeck

Registered:

Abstract

Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.Author Summary: Numerous choice tasks have been used to study decision processes. Some of these choice tasks, specifically n-armed bandit, information sampling and foraging tasks, pose choices that trade-off immediate and future reward. Specifically, the best choice may not be the choice that pays off the highest reward immediately, and exploration of unknown options vs. exploiting known options can be a normatively useful strategy. We characterized the optimal choice strategies across these tasks using Markov Decision Processes (MDPs). The MDP framework can characterize optimal choice strategies when choices are affected by the value of future rewards. We found that uncertainty and time horizon have important effects on the choice strategies in these tasks. Specifically, in bandit and information sampling tasks, increasing uncertainty increases the value of exploring choice options that tend to pay off in the future, while decreasing uncertainty increases the value of choice options that pay off immediately. These effects are increased when time horizons are longer. Foraging tasks differ in that uncertainty plays a minimal role. However, time horizon is still important in foraging. Specifically, for long time horizons, travel delays to rewards become less relevant.

Suggested Citation

Bruno B Averbeck, 2015. "Theory of Choice in Bandit, Information Sampling and Foraging Tasks," PLOS Computational Biology, Public Library of Science, vol. 11(3), pages 1-28, March.

Handle: RePEc:plo:pcbi00:1004164
DOI: 10.1371/journal.pcbi.1004164

Download full text from publisher

References listed on IDEAS

Robert J. Meyer & Yong Shi, 1995. "Sequential Choice Under Ambiguity: Intuitive Solutions to the Armed-Bandit Problem," Management Science, INFORMS, vol. 41(5), pages 817-834, May.
Mathias Pessiglione & Ben Seymour & Guillaume Flandin & Raymond J. Dolan & Chris D. Frith, 2006. "Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans," Nature, Nature, vol. 442(7106), pages 1042-1045, August.
Elise Payzan-LeNestour & Peter Bossaerts, 2011. "Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-14, January.
Nathaniel D. Daw & John P. O'Doherty & Peter Dayan & Ben Seymour & Raymond J. Dolan, 2006. "Cortical substrates for exploratory decisions in humans," Nature, Nature, vol. 441(7095), pages 876-879, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Daniel Bennett & Stefan Bode & Maja Brydevall & Hayley Warren & Carsten Murawski, 2016. "Intrinsic Valuation of Information in Decision Making under Uncertainty," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-21, July.
Lieke L F van Lieshout & Iris J Traast & Floris P de Lange & Roshan Cools, 2021. "Curiosity or savouring? Information seeking is modulated by both uncertainty and valence," PLOS ONE, Public Library of Science, vol. 16(9), pages 1-19, September.
Jorge Ramírez-Ruiz & Dmytro Grytskyy & Chiara Mastrogiuseppe & Yamen Habib & Rubén Moreno-Bote, 2024. "Complex behavior from intrinsic motivation to occupy future action-state path space," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
R Becket Ebitz & Brianna J Sleezer & Hank P Jedema & Charles W Bradberry & Benjamin Y Hayden, 2019. "Tonic exploration governs both flexibility and lapses," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-37, November.
Shinji Nakazato & Bojian Yang & Tetsuya Shimokawa, 2024. "Analyzing Human Search Behavior When Subjective Returns are Unobservable," Computational Economics, Springer;Society for Computational Economics, vol. 63(5), pages 1921-1947, May.
Gillian Dale & Danielle Sampers & Stephanie Loo & C Shawn Green, 2018. "Individual differences in exploration and persistence: Grit and beliefs about ability and reward," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-17, September.
Mike G. Tsionas & Pankaj C. Patel, 2022. "An entrepreneur's dilemma: An optimal stopping rule in pivoting," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 43(8), pages 3498-3515, December.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Hu, Yingyao & Kayaba, Yutaka & Shum, Matthew, 2013. "Nonparametric learning rules from bandit experiments: The eyes have it!," Games and Economic Behavior, Elsevier, vol. 81(C), pages 215-231.
- Yingyao Hu & Yutaka Kayaba & Matthew Shum, 2010. "Nonparametric learning rules from bandit experiments: the eyes have it!," CeMMAP working papers CWP15/10, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Yingyao Hu & Yutaka Kayaba & Matt Shum, 2010. "Nonparametric Learning Rules from Bandit Experiments: The Eyes have it!," Economics Working Paper Archive 560, The Johns Hopkins University,Department of Economics.
Maël Lebreton & Karin Bacily & Stefano Palminteri & Jan B Engelmann, 2019. "Contextual influence on confidence judgments in human reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-27, April.
repec:cup:judgdm:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
Tal Neiman & Yonatan Loewenstein, 2011. "Reinforcement learning in professional basketball players," Discussion Paper Series dp593, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.
repec:jdm:journl:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
Yilmaz Kocer, 2010. "Endogenous Learning with Bounded Memory," Working Papers 1290, Princeton University, Department of Economics, Econometric Research Program..
repec:cup:judgdm:v:12:y:2017:i:2:p:104-117 is not listed on IDEAS
Nazanin Mohammadi Sepahvand & Elisabeth Stöttinger & James Danckert & Britt Anderson, 2014. "Sequential Decisions: A Computational Comparison of Observational and Reinforcement Accounts," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-8, April.
Daniel E Acuña & Paul Schrater, 2010. "Structure Learning in Human Sequential Decision-Making," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-12, December.
Paul M. Krueger & Robert C. Wilson & Jonathan D. Cohen, 2017. "Strategies for exploration in the domain of losses," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 12(2), pages 104-117, March.
Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
Paul M Bays & Ben A Dowding, 2017. "Fidelity of the representation of value in decision-making," PLOS Computational Biology, Public Library of Science, vol. 13(3), pages 1-16, March.
Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
Martinovici, A., 2019. "Revealing attention - how eye movements predict brand choice and moment of choice," Other publications TiSEM 7dca38a5-9f78-4aee-bd81-c, Tilburg University, School of Economics and Management.
Yongping Bao & Ludwig Danwitz & Fabian Dvorak & Sebastian Fehrler & Lars Hornuf & Hsuan Yu Lin & Bettina von Helversen, 2022. "Similarity and Consistency in Algorithm-Guided Exploration," CESifo Working Paper Series 10188, CESifo.
Antoine Collomb-Clerc & Maëlle C. M. Gueguen & Lorella Minotti & Philippe Kahane & Vincent Navarro & Fabrice Bartolomei & Romain Carron & Jean Regis & Stephan Chabardès & Stefano Palminteri & Julien B, 2023. "Human thalamic low-frequency oscillations correlate with expected value and outcomes during reinforcement learning," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
Shi, Yuwei & Herniman, John, 2023. "The role of expectation in innovation evolution: Exploring hype cycles," Technovation, Elsevier, vol. 119(C).
Sashittal, Hemant C. & Sriramachandramurthy, Rajendran & Hodis, Monica, 2012. "Targeting college students on Facebook? How to stop wasting your money," Business Horizons, Elsevier, vol. 55(5), pages 495-507.
Micha Heilbron & Florent Meyniel, 2019. "Confidence resets reveal hierarchical adaptive learning in humans," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-24, April.
Manuel Glauco Carbone & Icro Maremmani, 2024. "Chronic Cocaine Use and Parkinson’s Disease: An Interpretative Model," IJERPH, MDPI, vol. 21(8), pages 1-23, August.
Peter S. Riefer & Bradley C. Love, 2015. "Unfazed by Both the Bull and Bear: Strategic Exploration in Dynamic Environments," Games, MDPI, vol. 6(3), pages 1-11, August.
Makoto Naruse & Eiji Yamamoto & Takashi Nakao & Takuma Akimoto & Hayato Saigo & Kazuya Okamura & Izumi Ojima & Georg Northoff & Hirokazu Hori, 2018. "Why is the environment important for decision making? Local reservoir model for choice-based learning," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-17, October.
David Vaquero-Puyuelo & Concepción De-la-Cámara & Beatriz Olaya & Patricia Gracia-García & Antonio Lobo & Raúl López-Antón & Javier Santabárbara, 2021. "Anhedonia as a Potential Risk Factor of Alzheimer’s Disease in a Community-Dwelling Elderly Sample: Results from the ZARADEMP Project," IJERPH, MDPI, vol. 18(4), pages 1-12, February.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004164. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Theory of Choice in Bandit, Information Sampling and Foraging Tasks

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data