IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2410.17466.html
   My bibliography  Save this paper

Evolution with Opponent-Learning Awareness

Author

Listed:
  • Yann Bouteiller
  • Karthik Soma
  • Giovanni Beltrame

Abstract

The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are usually constrained to small, homogeneous populations and remain computationally intensive. In this paper, we study how large heterogeneous populations of learning agents evolve in normal-form games. We show how, under assumptions commonly made in the multi-armed bandit literature, Multi-Agent Policy Gradient closely resembles the Replicator Dynamic, and we further derive a fast, parallelizable implementation of Opponent-Learning Awareness tailored for evolutionary simulations. This enables us to simulate the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. We demonstrate our approach in simulations of 200,000 agents, evolving in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game highlights distinct ways in which Opponent-Learning Awareness affects evolution.

Suggested Citation

  • Yann Bouteiller & Karthik Soma & Giovanni Beltrame, 2024. "Evolution with Opponent-Learning Awareness," Papers 2410.17466, arXiv.org, revised Oct 2024.
  • Handle: RePEc:arx:papers:2410.17466
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2410.17466
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Apesteguia, Jose & Huck, Steffen & Oechssler, Jorg, 2007. "Imitation--theory and experimental evidence," Journal of Economic Theory, Elsevier, vol. 136(1), pages 217-235, September.
    2. Jorgen W. Weibull, 1997. "Evolutionary Game Theory," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262731215, April.
    3. John G. Cross, 1973. "A Stochastic Learning Model of Economic Behavior," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 87(2), pages 239-266.
    4. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    5. Mertikopoulos, Panayotis & Sandholm, William H., 2018. "Riemannian game dynamics," Journal of Economic Theory, Elsevier, vol. 177(C), pages 315-364.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohlin, Erik & Östling, Robert & Wang, Joseph Tao-yi, 2020. "Learning by similarity-weighted imitation in winner-takes-all games," Games and Economic Behavior, Elsevier, vol. 120(C), pages 225-245.
    2. Jiayang Li & Zhaoran Wang & Yu Marco Nie, 2023. "Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice," Papers 2304.02500, arXiv.org, revised Feb 2024.
    3. Innocenti, Stefania & Cowan, Robin, 2019. "Self-efficacy beliefs and imitation: A two-armed bandit experiment," European Economic Review, Elsevier, vol. 113(C), pages 156-172.
    4. Srinivas Arigapudi & Omer Edhan & Yuval Heller & Ziv Hellman, 2022. "Mentors and Recombinators: Multi-Dimensional Social Learning," Papers 2205.00278, arXiv.org, revised Nov 2023.
    5. Norman, Thomas W.L., 2023. "Pigouvian algorithmic platform design," Journal of Economic Behavior & Organization, Elsevier, vol. 212(C), pages 322-332.
    6. Weibull, Jörgen W., 1997. "What have we learned from Evolutionary Game Theory so far?," Working Paper Series 487, Research Institute of Industrial Economics, revised 26 Oct 1998.
    7. Erik Mohlin & Robert Ostling & Joseph Tao-yi Wang, 2014. "Learning by Imitation in Games: Theory, Field, and Laboratory," Economics Series Working Papers 734, University of Oxford, Department of Economics.
    8. Dufwenberg, Martin, 1997. "Some relationships between evolutionary stability criteria in games," Economics Letters, Elsevier, vol. 57(1), pages 45-50, November.
    9. Lichi Zhang & Yanyan Jiang & Junmin Wu, 2022. "Evolutionary Game Analysis of Government and Residents’ Participation in Waste Separation Based on Cumulative Prospect Theory," IJERPH, MDPI, vol. 19(21), pages 1-16, November.
    10. Jonas Hedlund & Carlos Oyarzun, 2018. "Imitation in heterogeneous populations," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 65(4), pages 937-973, June.
    11. Tom Johnston & Michael Savery & Alex Scott & Bassel Tarbush, 2023. "Game Connectivity and Adaptive Dynamics," Papers 2309.10609, arXiv.org, revised Oct 2024.
    12. Tian Zhu & Merry H. Ma, 2022. "Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning," Stats, MDPI, vol. 5(3), pages 1-14, August.
    13. Xiaoyue Li & John M. Mulvey, 2023. "Optimal Portfolio Execution in a Regime-switching Market with Non-linear Impact Costs: Combining Dynamic Program and Neural Network," Papers 2306.08809, arXiv.org.
    14. Gu, Tianqi & Xu, Weiping & Liang, Hua & He, Qing & Zheng, Nan, 2024. "School bus transport service strategies’ policy-making mechanism – An evolutionary game approach," Transportation Research Part A: Policy and Practice, Elsevier, vol. 182(C).
    15. Jacob K. Goeree & Leeat Yariv, 2015. "Conformity in the lab," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 1(1), pages 15-28, July.
    16. Pedro Afonso Fernandes, 2024. "Forecasting with Neuro-Dynamic Programming," Papers 2404.03737, arXiv.org.
    17. Ianni, A., 2002. "Reinforcement learning and the power law of practice: some analytical results," Discussion Paper Series In Economics And Econometrics 203, Economics Division, School of Social Sciences, University of Southampton.
    18. Petrohilos-Andrianos, Yannis & Xepapadeas, Anastasios, 2017. "Resource harvesting regulation and enforcement: An evolutionary approach," Research in Economics, Elsevier, vol. 71(2), pages 236-253.
    19. Philippe Jehiel, 2022. "Analogy-Based Expectation Equilibrium and Related Concepts:Theory, Applications, and Beyond," Working Papers halshs-03735680, HAL.
    20. Nathan Companez & Aldeida Aleti, 2016. "Can Monte-Carlo Tree Search learn to sacrifice?," Journal of Heuristics, Springer, vol. 22(6), pages 783-813, December.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2410.17466. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.