IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000131.html
   My bibliography  Save this article

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Author

Listed:
  • Giancarlo La Camera
  • Barry J Richmond

Abstract

It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.Author Summary: Theories of rational behavior are built on a number of principles, including the assumption that subjects adjust their behavior to maximize their long-term returns and that they should work equally hard to obtain a reward in situations where the effort to obtain reward is the same (called the invariance principle). Humans, however, are sensitive to the manner in which equivalent choices are presented, or “framed,” and often have a greater tendency to continue an endeavor once an investment in money, effort, or time has been made, a phenomenon known as “sunk cost” effect. In a similar manner, when monkeys must perform different numbers of trials to obtain a reward, they work harder as the number of trials already performed increases, even though both the work remaining and the forthcoming reward are the same in all situations. Methods from the theory of Reinforcement Learning, which usually provide learning strategies aimed at maximizing returns, cannot model this violation of invariance. Here we generalize a prominent method of Reinforcement Learning so as to explain the violation of invariance, without losing the ability to model behaviors explained by standard Reinforcement Learning models. This generalization extends our understanding of how animals and humans learn and behave.

Suggested Citation

  • Giancarlo La Camera & Barry J Richmond, 2008. "Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
  • Handle: RePEc:plo:pcbi00:1000131
    DOI: 10.1371/journal.pcbi.1000131
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000131
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000131&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000131?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Arrow, Kenneth J, 1982. "Risk Perception in Psychology and Economics," Economic Inquiry, Western Economic Association International, vol. 20(1), pages 1-9, January.
    2. N Schweighofer & K Shishida & C E Han & Y Okamoto & S C Tanaka & S Yamawaki & K Doya, 2006. "Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints," PLOS Computational Biology, Public Library of Science, vol. 2(11), pages 1-8, November.
    3. Schoemaker, Paul J H, 1982. "The Expected Utility Model: Its Variants, Purposes, Evidence and Limitations," Journal of Economic Literature, American Economic Association, vol. 20(2), pages 529-563, June.
    4. Nathaniel D. Daw & John P. O'Doherty & Peter Dayan & Ben Seymour & Raymond J. Dolan, 2006. "Cortical substrates for exploratory decisions in humans," Nature, Nature, vol. 441(7095), pages 876-879, June.
    5. Tversky, Amos & Kahneman, Daniel, 1986. "Rational Choice and the Framing of Decisions," The Journal of Business, University of Chicago Press, vol. 59(4), pages 251-278, October.
    6. Arkes, Hal R. & Blumer, Catherine, 1985. "The psychology of sunk cost," Organizational Behavior and Human Decision Processes, Elsevier, vol. 35(1), pages 124-140, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Freeman, Steven F., 1997. "Good decisions : reconciling human rationality, evolution, and ethics," Working papers WP 3962-97., Massachusetts Institute of Technology (MIT), Sloan School of Management.
    2. Robison, Lindon J. & Shupp, Robert S. & Myers, Robert J., 2010. "Expected utility paradoxes," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 39(2), pages 187-193, April.
    3. William C. McDaniel & Francis Sistrunk, 1991. "Management Dilemmas and Decisions," Journal of Conflict Resolution, Peace Science Society (International), vol. 35(1), pages 21-42, March.
    4. Schwaiger, Rene & Kirchler, Michael & Lindner, Florian & Weitzel, Utz, 2020. "Determinants of investor expectations and satisfaction. A study with financial professionals," Journal of Economic Dynamics and Control, Elsevier, vol. 110(C).
    5. Cindy Cifuentes Gómez & Siervo Tulio Delgado Ruiz & Jorge Iván González, 2021. "El comportamiento económico desde la perspectiva biológica y psicológica," Apuntes del Cenes, Universidad Pedagógica y Tecnológica de Colombia, vol. 40(72), pages 17-43, July.
    6. repec:cup:judgdm:v:7:y:2012:i:4:p:462-471 is not listed on IDEAS
    7. Bruno S. Frey & Reiner Eichenberger, 1989. "Should Social Scientists Care about Choice Anomalies?," Rationality and Society, , vol. 1(1), pages 101-122, July.
    8. Harvey, Michael & Reiche, B. Sebastian & Moeller, Miriam, 2011. "Developing effective global relationships through staffing with inpatriate managers: The role of interpersonal trust," Journal of International Management, Elsevier, vol. 17(2), pages 150-161, June.
    9. Wei Qi & Xiumei Guo & Xia Wu & Dora Marinova & Jin Fan, 2018. "Do the sunk cost effect and cognitive dissonance increase risk perception? An empirical study in the context of city smog," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(5), pages 2269-2289, September.
    10. Bashir Ahmad Joo & Kokab Durri, 2015. "Comprehensive Review of Literature on Behavioural Finance," Indian Journal of Commerce and Management Studies, Educational Research Multimedia & Publications,India, vol. 6(2), pages 11-19, May.
    11. Marc Willinger, 1990. "La rénovation des fondements de l'utilité et du risque," Revue Économique, Programme National Persée, vol. 41(1), pages 5-48.
    12. Delli Gatti,Domenico & Fagiolo,Giorgio & Gallegati,Mauro & Richiardi,Matteo & Russo,Alberto (ed.), 2018. "Agent-Based Models in Economics," Cambridge Books, Cambridge University Press, number 9781108400046, January.
    13. Wasem, Jürgen, 1992. "Von der "Poliklinik" in die Kassenarztpraxis: Versuch einer Rekonstruktion der Entscheidungssituation ambulant tätiger Ärzte in Ostdeutschland," MPIfG Discussion Paper 92/5, Max Planck Institute for the Study of Societies.
    14. Peter Fraser‐Mackenzie & Ming‐Chien Sung & Johnnie E.V. Johnson, 2014. "Toward an Understanding of the Influence of Cultural Background and Domain Experience on the Effects of Risk‐Pricing Formats on Risk Perception," Risk Analysis, John Wiley & Sons, vol. 34(10), pages 1846-1869, October.
    15. Oleg Uzhga-Rebrov & Peter Grabusts, 2021. "Cumulative Prospect Theory Version with Fuzzy Values of Outcome Estimates," Risks, MDPI, vol. 9(4), pages 1-16, April.
    16. repec:cup:judgdm:v:13:y:2018:i:6:p:575-586 is not listed on IDEAS
    17. Dorian Jullien, 2016. "All Frames Created Equal are Not Identical: On the Structure of Kahneman and Tversky's Framing Effects," GREDEG Working Papers 2016-17, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France.
    18. R. Aversi & G. Dosi & G. Fagiolo & M. Meacci & C. Olivetti, 1997. "Demand Dynamics With Socially Evolving Preferences," Working Papers ir97081, International Institute for Applied Systems Analysis.
    19. Maximilian Rüdisser & Raphael Flepp & Egon Franck, 2017. "Do casinos pay their customers to become risk-averse? Revising the house money effect in a field experiment," Experimental Economics, Springer;Economic Science Association, vol. 20(3), pages 736-754, September.
    20. Raj Aggarwal, 2004. "Persistent Puzzles in International Finance and Economics," The Economic and Social Review, Economic and Social Studies, vol. 35(3), pages 241-250.
    21. Daniele SCHILIRO, 2016. "Economics and Psychology The Framing of Decisions," Journal of Mathematical Economics and Finance, ASERS Publishing, vol. 2(2), pages 77-88.
    22. Kahneman, Daniel, 2002. "Maps of Bounded Rationality," Nobel Prize in Economics documents 2002-4, Nobel Prize Committee.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000131. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.