IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v175y2023ip1s0960077923009335.html
   My bibliography  Save this article

Emergence of cooperation in two-agent repeated games with reinforcement learning

Author

Listed:
  • Ding, Zhen-Wei
  • Zheng, Guo-Zhong
  • Cai, Chao-Ran
  • Cai, Wei-Ran
  • Chen, Li
  • Zhang, Ji-Qiang
  • Wang, Xu-Ming

Abstract

Cooperation is the foundation of ecosystems and the human society, and the reinforcement learning provides crucial insight into the mechanism for its emergence. However, most previous work has mostly focused on the self-organization at the population level, the fundamental dynamics at the individual level remains unclear. Here, we investigate the evolution of cooperation in a two-agent system, where each agent pursues optimal policies according to the classical Q-learning algorithm in playing the strict prisoner’s dilemma. We reveal that a strong memory and long-sighted expectation yield the emergence of Coordinated Optimal Policies (COPs), where both agents act like “Win-Stay, Lose-Shift” (WSLS) to maintain a high level of cooperation. Otherwise, players become tolerant toward their co-player’s defection and the cooperation loses stability in the end where the policy “all Defection” (All-D) prevails. This suggests that tolerance could be a good precursor to a crisis in cooperation. Furthermore, our analysis shows that the Coordinated Optimal Modes (COMs) for different COPs gradually lose stability as memory weakens and expectation for the future decreases, where agents fail to predict co-player’s action in games and defection dominates. As a result, we give the constraint to expectations of future and memory strength for maintaining cooperation. In contrast to the previous work, the impact of exploration on cooperation is found not be consistent, but depends on composition of COMs. By clarifying these fundamental issues in this two-player system, we hope that our work could be helpful for understanding the emergence and stability of cooperation in more complex scenarios in reality.

Suggested Citation

  • Ding, Zhen-Wei & Zheng, Guo-Zhong & Cai, Chao-Ran & Cai, Wei-Ran & Chen, Li & Zhang, Ji-Qiang & Wang, Xu-Ming, 2023. "Emergence of cooperation in two-agent repeated games with reinforcement learning," Chaos, Solitons & Fractals, Elsevier, vol. 175(P1).
  • Handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335
    DOI: 10.1016/j.chaos.2023.114032
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077923009335
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2023.114032?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kreps, David M. & Milgrom, Paul & Roberts, John & Wilson, Robert, 1982. "Rational cooperation in the finitely repeated prisoners' dilemma," Journal of Economic Theory, Elsevier, vol. 27(2), pages 245-252, August.
    2. Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Publisher Correction: Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 523-523, July.
    3. J. M. Meylahn & L. Janssen & Hassan Zargarzadeh, 2022. "Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games," Complexity, Hindawi, vol. 2022, pages 1-20, November.
    4. Andreoni, James A & Miller, John H, 1993. "Rational Cooperation in the Finitely Repeated Prisoner's Dilemma: Experimental Evidence," Economic Journal, Royal Economic Society, vol. 103(418), pages 570-585, May.
    5. Christian Hilbe & Krishnendu Chatterjee & Martin A. Nowak, 2018. "Partners and rivals in direct reciprocity," Nature Human Behaviour, Nature, vol. 2(7), pages 469-477, July.
    6. Ashleigh S. Griffin & Stuart A. West & Angus Buckling, 2004. "Cooperation and competition in pathogenic bacteria," Nature, Nature, vol. 430(7003), pages 1024-1027, August.
    7. Momchil S. Tomov & Eric Schulz & Samuel J. Gershman, 2021. "Multi-task reinforcement learning in humans," Nature Human Behaviour, Nature, vol. 5(6), pages 764-773, June.
    8. You, Tao & Yang, Haochun & Wang, Jian & Zhang, Peng & Chen, Jinchao & Zhang, Ying, 2023. "Cooperative behavior under the influence of multiple experienced guiders in Prisoner’s dilemma game," Applied Mathematics and Computation, Elsevier, vol. 458(C).
    9. Yoella Bereby-Meyer & Alvin E. Roth, 2006. "The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation," American Economic Review, American Economic Association, vol. 96(4), pages 1029-1042, September.
    10. Pedro Dal Bó & Guillaume R. Fréchette, 2019. "Strategy Choice in the Infinitely Repeated Prisoner's Dilemma," American Economic Review, American Economic Association, vol. 109(11), pages 3929-3952, November.
    11. Hilbe, Christian & Traulsen, Arne & Sigmund, Karl, 2015. "Partners or rivals? Strategies for the iterated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 92(C), pages 41-52.
    12. Marie Devaine & Guillaume Hollard & Jean Daunizeau, 2014. "Theory of Mind: Did Evolution Fool Us?," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    13. Zhu, Wenqiang & Pan, Qiuhui & Song, Sha & He, Mingfeng, 2023. "Effects of exposure-based reward and punishment on the evolution of cooperation in prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 172(C).
    14. Deng, Xinyang & Zhang, Zhipeng & Deng, Yong & Liu, Qi & Chang, Shuhua, 2016. "Self-adaptive win-stay-lose-shift reference selection mechanism promotes cooperation on a square lattice," Applied Mathematics and Computation, Elsevier, vol. 284(C), pages 322-331.
    15. Li, Dandan & Zhou, Kai & Sun, Mei & Han, Dun, 2023. "Investigating the effectiveness of individuals’ historical memory for the evolution of the prisoner’s dilemma game," Chaos, Solitons & Fractals, Elsevier, vol. 170(C).
    16. Hans-Theo Normann & Brian Wallace, 2012. "The impact of the termination rule on cooperation in a prisoner’s dilemma experiment," International Journal of Game Theory, Springer;Game Theory Society, vol. 41(3), pages 707-718, August.
    17. J. Keith Murnighan & Alvin E. Roth, 1983. "Expecting Continued Play in Prisoner's Dilemma Games," Journal of Conflict Resolution, Peace Science Society (International), vol. 27(2), pages 279-300, June.
    18. Usui, Yuki & Ueda, Masahiko, 2021. "Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma," Applied Mathematics and Computation, Elsevier, vol. 409(C).
    19. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    20. Jia, Danyang & Li, Tong & Zhao, Yang & Zhang, Xiaoqin & Wang, Zhen, 2022. "Empty nodes affect conditional cooperation under reinforcement learning," Applied Mathematics and Computation, Elsevier, vol. 413(C).
    21. Marc Harper & Vincent Knight & Martin Jones & Georgios Koutsovoulos & Nikoleta E Glynatsi & Owen Campbell, 2017. "Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-33, December.
    22. Wolfram Barfuss & Janusz Meylahn, 2022. "Intrinsic fluctuations of reinforcement learning promote cooperation," Papers 2209.01013, arXiv.org, revised Feb 2023.
    23. Gabriele Camera & Marco Casari, 2009. "Cooperation among Strangers under the Shadow of the Future," American Economic Review, American Economic Association, vol. 99(3), pages 979-1005, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pedro Dal Bo & Guillaume R. Frochette, 2011. "The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence," American Economic Review, American Economic Association, vol. 101(1), pages 411-429, February.
    2. Kamei, Kenju, 2019. "Cooperation and Endogenous Repetition in an Infinitely Repeated Social Dilemma: Experimental Evidence," MPRA Paper 92097, University Library of Munich, Germany.
    3. Maria Bigoni & Marco Casari & Andrzej Skrzypacz & Giancarlo Spagnolo, 2015. "Time Horizon and Cooperation in Continuous Time," Econometrica, Econometric Society, vol. 83, pages 587-616, March.
    4. Chakraborty, Anujit, 2023. "Motives behind cooperation in finitely repeated prisoner's dilemma," Games and Economic Behavior, Elsevier, vol. 141(C), pages 105-132.
    5. Lugovskyy, Volodymyr & Puzzello, Daniela & Sorensen, Andrea & Walker, James & Williams, Arlington, 2017. "An experimental study of finitely and infinitely repeated linear public goods games," Games and Economic Behavior, Elsevier, vol. 102(C), pages 286-302.
    6. Todd Guilfoos & Andreas Pape, 2016. "Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory," Theory and Decision, Springer, vol. 80(1), pages 1-32, January.
    7. Anujit Chakraborty, 2022. "Motives Behind Cooperation in Finitely Repeated Prisoner's Dilemma," Working Papers 353, University of California, Davis, Department of Economics.
    8. Caleb Cox & Matthew Jones & Kevin Pflum & Paul Healy, 2015. "Revealed reputations in the finitely repeated prisoners’ dilemma," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 58(3), pages 441-484, April.
    9. Kamei, Kenju, 2016. "Information Disclosure and Cooperation in a Finitely-repeated Dilemma: Experimental Evidence," MPRA Paper 75100, University Library of Munich, Germany.
    10. Kartal, Melis & Müller, Wieland & Tremewan, James, 2021. "Building trust: The costs and benefits of gradualism," Games and Economic Behavior, Elsevier, vol. 130(C), pages 258-275.
    11. Wang, Xianjia & Yang, Zhipeng & Liu, Yanli & Chen, Guici, 2023. "A reinforcement learning-based strategy updating model for the cooperative evolution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
    12. Pedro Dal Bó, 2005. "Cooperation under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games," American Economic Review, American Economic Association, vol. 95(5), pages 1591-1604, December.
    13. repec:tiu:tiucen:200922 is not listed on IDEAS
    14. Howard Kunreuther & Gabriel Silvasi & Eric T. Bradlow & Dylan Small, 2007. "Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security," NBER Technical Working Papers 0341, National Bureau of Economic Research, Inc.
    15. Ernesto Reuben & Sigrid Suetens, 2012. "Revisiting strategic versus non-strategic cooperation," Experimental Economics, Springer;Economic Science Association, vol. 15(1), pages 24-43, March.
    16. Landeo, Claudia M. & Spier, Kathryn E., 2015. "Incentive contracts for teams: Experimental evidence," Journal of Economic Behavior & Organization, Elsevier, vol. 119(C), pages 496-511.
    17. Ralph-C Bayer, 2014. "On the Credibility of Punishment in Repeated Social Dilemma Games," School of Economics and Public Policy Working Papers 2014-08, University of Adelaide, School of Economics and Public Policy.
    18. Pedro Dal Bó, 2007. "Tacit collusion under interest rate fluctuations," RAND Journal of Economics, RAND Corporation, vol. 38(2), pages 533-540, June.
    19. Ghidoni, Riccardo & Cleave, Blair L. & Suetens, Sigrid, 2019. "Perfect and imperfect strangers in social dilemmas," European Economic Review, Elsevier, vol. 116(C), pages 148-159.
    20. John Duffy & Felix Munoz-Garcia, 2009. "Patience or Fairness? Analyzing Social Preferences in Repeated Games," Working Paper 383, Department of Economics, University of Pittsburgh, revised Nov 2009.
    21. Vi Cao, 2022. "An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma," International Journal of Game Theory, Springer;Game Theory Society, vol. 51(1), pages 53-85, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:175:y:2023:i:p1:s0960077923009335. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.