IDEAS home Printed from https://ideas.repec.org/a/eee/gamebe/v144y2024icp84-103.html
   My bibliography  Save this article

Reinforcement learning in a prisoner's dilemma

Author

Listed:
  • Dolgopolov, Arthur

Abstract

I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.

Suggested Citation

  • Dolgopolov, Arthur, 2024. "Reinforcement learning in a prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 144(C), pages 84-103.
  • Handle: RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103
    DOI: 10.1016/j.geb.2024.01.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0899825624000058
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.geb.2024.01.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Calvano, Emilio & Calzolari, Giacomo & Denicolò, Vincenzo & Pastorello, Sergio, 2023. "Algorithmic collusion: Genuine or spurious?," International Journal of Industrial Organization, Elsevier, vol. 90(C).
    2. Glenn Ellison, 2000. "Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-Step Evolution," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 67(1), pages 17-45.
    3. Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
    4. Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
    5. Heinrich H. Nax, 2019. "Uncoupled Aspiration Adaptation Dynamics Into the Core," German Economic Review, Verein für Socialpolitik, vol. 20(2), pages 243-256, May.
    6. Newton, Jonathan & Sawa, Ryoji, 2015. "A one-shot deviation principle for stability in matching problems," Journal of Economic Theory, Elsevier, vol. 157(C), pages 1-27.
    7. Calzolari, Giacomo & Calvano, Emilio & Denicolo, Vincenzo & Pastorello, Sergio, 2021. "Algorithmic collusion with imperfect monitoring," CEPR Discussion Papers 15738, C.E.P.R. Discussion Papers.
    8. Arnoud V. den Boer & Janusz M. Meylahn & Maarten Pieter Schinkel, 2022. "Artificial Collusion: Examining Supracompetitive Pricing by Q-learning Algorithms," Tinbergen Institute Discussion Papers 22-067/VII, Tinbergen Institute.
    9. Ennio Bilancini & Leonardo Boncinelli, 2020. "The evolution of conventions under condition-dependent mistakes," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 69(2), pages 497-521, March.
    10. Matthias Hettich, 2021. "Algorithmic Collusion: Insights from Deep Learning," CQE Working Papers 9421, Center for Quantitative Economics (CQE), University of Muenster.
    11. Stephanie Assad & Robert Clark & Daniel Ershov & Lei Xu, 2022. "Identifying Algorithmic Pricing Technology Adoption in Retail Gasoline Markets," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 457-460, May.
    12. Young, H Peyton, 1993. "The Evolution of Conventions," Econometrica, Econometric Society, vol. 61(1), pages 57-84, January.
    13. Sergiu Hart & Andreu Mas-Colell, 2013. "Uncoupled Dynamics Do Not Lead To Nash Equilibrium," World Scientific Book Chapters, in: Simple Adaptive Strategies From Regret-Matching to Uncoupled Dynamics, chapter 7, pages 153-163, World Scientific Publishing Co. Pte. Ltd..
    14. Roth, Alvin E. & Erev, Ido, 1995. "Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term," Games and Economic Behavior, Elsevier, vol. 8(1), pages 164-212.
    15. Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2020. "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, American Economic Association, vol. 110(10), pages 3267-3297, October.
    16. Milgrom, Paul & Roberts, John, 1990. "Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities," Econometrica, Econometric Society, vol. 58(6), pages 1255-1277, November.
    17. Matthias Blonski & Peter Ockenfels & Giancarlo Spagnolo, 2011. "Equilibrium Selection in the Repeated Prisoner's Dilemma: Axiomatic Approach and Experimental Evidence," American Economic Journal: Microeconomics, American Economic Association, vol. 3(3), pages 164-192, August.
    18. , P. & , Peyton, 2006. "Regret testing: learning to play Nash equilibrium without knowing you have an opponent," Theoretical Economics, Econometric Society, vol. 1(3), pages 341-367, September.
    19. Waltman, Ludo & Kaymak, Uzay, 2008. "Q-learning agents in a Cournot oligopoly model," Journal of Economic Dynamics and Control, Elsevier, vol. 32(10), pages 3275-3293, October.
    20. Mengel, Friederike, 2014. "Learning by (limited) forward looking players," Journal of Economic Behavior & Organization, Elsevier, vol. 108(C), pages 59-77.
    21. Heinrich Nax & Bary Pradelski, 2015. "Evolutionary dynamics and equitable core selection in assignment games," International Journal of Game Theory, Springer;Game Theory Society, vol. 44(4), pages 903-932, November.
    22. John Asker & Chaim Fershtman & Ariel Pakes, 2021. "Artificial Intelligence and Pricing: The Impact of Algorithm Design," NBER Working Papers 28535, National Bureau of Economic Research, Inc.
    23. John Asker & Chaim Fershtman & Ariel Pakes, 2022. "Artificial Intelligence, Algorithm Design, and Pricing," AEA Papers and Proceedings, American Economic Association, vol. 112, pages 452-456, May.
    24. Bilancini, Ennio & Boncinelli, Leonardo & Nax, Heinrich H., 2021. "What noise matters? Experimental evidence for stochastic deviations in social norms," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).
    25. Joseph E Harrington, 2018. "Developing Competition Law For Collusion By Autonomous Artificial Agents," Journal of Competition Law and Economics, Oxford University Press, vol. 14(3), pages 331-363.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang Xu & Wei Zhao, 2024. "On Mechanism Underlying Algorithmic Collusion," Papers 2409.01147, arXiv.org.
    2. Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
    2. Bilancini, Ennio & Boncinelli, Leonardo & Newton, Jonathan, 2020. "Evolution and Rawlsian social choice in matching," Games and Economic Behavior, Elsevier, vol. 123(C), pages 68-80.
    3. Zhang Xu & Wei Zhao, 2024. "On Mechanism Underlying Algorithmic Collusion," Papers 2409.01147, arXiv.org.
    4. Nax, Heinrich H., 2015. "Equity dynamics in bargaining without information exchange," LSE Research Online Documents on Economics 65426, London School of Economics and Political Science, LSE Library.
    5. Heinrich Nax, 2015. "Equity dynamics in bargaining without information exchange," Journal of Evolutionary Economics, Springer, vol. 25(5), pages 1011-1026, November.
    6. Lucila Porto, 2022. "Q-Learning algorithms in a Hotelling model," Asociación Argentina de Economía Política: Working Papers 4587, Asociación Argentina de Economía Política.
    7. Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2019. "Algorithmic Pricing What Implications for Competition Policy?," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 55(1), pages 155-171, August.
    8. Epivent, Andréa & Lambin, Xavier, 2024. "On algorithmic collusion and reward–punishment schemes," Economics Letters, Elsevier, vol. 237(C).
    9. Sawa, Ryoji, 2021. "A stochastic stability analysis with observation errors in normal form games," Games and Economic Behavior, Elsevier, vol. 129(C), pages 570-589.
    10. Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.
    11. Mäs, Michael & Nax, Heinrich H., 2016. "A behavioral study of “noise” in coordination games," LSE Research Online Documents on Economics 65422, London School of Economics and Political Science, LSE Library.
    12. Eugenio Vicario, 2021. "Imitation and Local Interactions: Long Run Equilibrium Selection," Games, MDPI, vol. 12(2), pages 1-19, April.
    13. Sawa, Ryoji, 2019. "Stochastic stability under logit choice in coalitional bargaining problems," Games and Economic Behavior, Elsevier, vol. 113(C), pages 633-650.
    14. Sawa, Ryoji & Wu, Jiabin, 2018. "Reference-dependent preferences, super-dominance and stochastic stability," Journal of Mathematical Economics, Elsevier, vol. 78(C), pages 96-104.
    15. Mäs, Michael & Nax, Heinrich H., 2016. "A behavioral study of “noise” in coordination games," Journal of Economic Theory, Elsevier, vol. 162(C), pages 195-208.
    16. Heinrich Nax & Bary Pradelski, 2015. "Evolutionary dynamics and equitable core selection in assignment games," International Journal of Game Theory, Springer;Game Theory Society, vol. 44(4), pages 903-932, November.
    17. Nax, Heinrich H. & Pradelski, Bary S. R., 2015. "Evolutionary dynamics and equitable core selection in assignment games," LSE Research Online Documents on Economics 65428, London School of Economics and Political Science, LSE Library.
    18. Maria Montero & Alex Possajennikov, 2021. "An Adaptive Model of Demand Adjustment in Weighted Majority Games," Games, MDPI, vol. 13(1), pages 1-17, December.
    19. Jean-François Laslier & Bernard Walliser, 2015. "Stubborn learning," Theory and Decision, Springer, vol. 79(1), pages 51-93, July.
    20. Bilancini, Ennio & Boncinelli, Leonardo & Nax, Heinrich H., 2021. "What noise matters? Experimental evidence for stochastic deviations in social norms," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).

    More about this item

    Keywords

    Q-learning; Stochastic stability; Evolutionary game theory; Collusion; Pricing-algorithms;
    All these keywords.

    JEL classification:

    • C72 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Noncooperative Games
    • C73 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Stochastic and Dynamic Games; Evolutionary Games
    • D43 - Microeconomics - - Market Structure, Pricing, and Design - - - Oligopoly and Other Forms of Market Imperfection
    • D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
    • L41 - Industrial Organization - - Antitrust Issues and Policies - - - Monopolization; Horizontal Anticompetitive Practices

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/622836 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.