IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2106.12928.html
   My bibliography  Save this paper

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Author

Listed:
  • Stefanos Leonardos
  • Georgios Piliouras
  • Kelly Spendlove

Abstract

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

Suggested Citation

  • Stefanos Leonardos & Georgios Piliouras & Kelly Spendlove, 2021. "Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality," Papers 2106.12928, arXiv.org.
  • Handle: RePEc:arx:papers:2106.12928
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2106.12928
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Pierre Coucheney & Bruno Gaujal & Panayotis Mertikopoulos, 2015. "Penalty-Regulated Dynamics and Robust Learning Procedures in Games," Mathematics of Operations Research, INFORMS, vol. 40(3), pages 611-633, March.
    2. Romero, Julian, 2015. "The effect of hysteresis on equilibrium selection in coordination games," Journal of Economic Behavior & Organization, Elsevier, vol. 111(C), pages 88-105.
    3. Alós-Ferrer, Carlos & Netzer, Nick, 2010. "The logit-response dynamics," Games and Economic Behavior, Elsevier, vol. 68(2), pages 413-427, March.
    4. Yang Cai & Ozan Candogan & Constantinos Daskalakis & Christos Papadimitriou, 2016. "Zero-Sum Polymatrix Games: A Generalization of Minmax," Mathematics of Operations Research, INFORMS, vol. 41(2), pages 648-655, May.
    5. Pangallo, Marco & Sanders, James B.T. & Galla, Tobias & Farmer, J. Doyne, 2022. "Towards a taxonomy of learning dynamics in 2 × 2 games," Games and Economic Behavior, Elsevier, vol. 132(C), pages 1-21.
    6. Kim, Youngse, 1996. "Equilibrium Selection inn-Person Coordination Games," Games and Economic Behavior, Elsevier, vol. 15(2), pages 203-227, August.
    7. Julian Romero, 2011. "The Effect of Hysteresis on Equilibrium Selection in Coordination Games," Purdue University Economics Working Papers 1265, Purdue University, Department of Economics.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Michael S. Harré, 2022. "What Can Game Theory Tell Us about an AI ‘Theory of Mind’?," Games, MDPI, vol. 13(3), pages 1-11, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ger Yang & David Basanta & Georgios Piliouras, 2018. "Bifurcation Mechanism Design—From Optimal Flat Taxes to Better Cancer Treatments," Games, MDPI, vol. 9(2), pages 1-38, April.
    2. Maoliang Ye & Jie Zheng & Plamen Nikolov & Sam Asher, 2020. "One Step at a Time: Does Gradualism Build Coordination?," Management Science, INFORMS, vol. 66(1), pages 113-129, January.
    3. Natalia Fabra & Juan-Pablo Montero, 2022. "Product Lines and Price Discrimination in Markets with Information Frictions," Management Science, INFORMS, vol. 68(2), pages 981-1001, February.
    4. Carlos Alós-Ferrer & Nick Netzer, 2015. "Robust stochastic stability," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 58(1), pages 31-57, January.
    5. Jakub Bielawski & Thiparat Chotibut & Fryderyk Falniowski & Michal Misiurewicz & Georgios Piliouras, 2022. "Unpredictable dynamics in congestion games: memory loss can prevent chaos," Papers 2201.10992, arXiv.org, revised Jan 2022.
    6. Ahrash Dianat & Christoph Siemroth, 2021. "Improving decisions with market information: an experiment on corporate prediction markets," Experimental Economics, Springer;Economic Science Association, vol. 24(1), pages 143-176, March.
    7. Friedel Bolle & Jörg Spiller, 2021. "Cooperation against all predictions," Economic Inquiry, Western Economic Association International, vol. 59(3), pages 904-924, July.
    8. Aidas Masiliunas, 2016. "Inefficient Lock-in with Sophisticated and Myopic Players," Working Papers halshs-01304178, HAL.
    9. Bradley J. Ruffle, Avi Weiss, Amir Etziony, 2015. "The Role of Critical Mass in Establishing a Successful Network Market: An Experimental Investigation," LCERPA Working Papers 0092, Laurier Centre for Economic Research and Policy Analysis, revised 12 May 2015.
    10. Jun Honda, 2015. "Games with the Total Bandwagon Property," Department of Economics Working Papers wuwp197, Vienna University of Economics and Business, Department of Economics.
    11. , & , & ,, 2008. "Monotone methods for equilibrium selection under perfect foresight dynamics," Theoretical Economics, Econometric Society, vol. 3(2), June.
    12. Keser, Claudia & Suleymanova, Irina & Wey, Christian, 2012. "Technology adoption in markets with network effects: Theory and experimental evidence," Information Economics and Policy, Elsevier, vol. 24(3), pages 262-276.
    13. Dianat, Ahrash & Echenique, Federico & Yariv, Leeat, 2022. "Statistical discrimination and affirmative action in the lab," Games and Economic Behavior, Elsevier, vol. 132(C), pages 41-58.
    14. Vincent Boucher, 2017. "Selecting Equilibria using Best-Response Dynamics," Economics Bulletin, AccessEcon, vol. 37(4), pages 2728-2734.
    15. Hwang, Sung-Ha & Rey-Bellet, Luc, 2021. "Positive feedback in coordination games: Stochastic evolutionary dynamics and the logit choice rule," Games and Economic Behavior, Elsevier, vol. 126(C), pages 355-373.
    16. Ennio Bilancini & Leonardo Boncinelli, 2020. "The evolution of conventions under condition-dependent mistakes," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 69(2), pages 497-521, March.
    17. Francesco De Sinopoli & Leo Ferraris & Claudia Meroni, 2024. "Group size as selection device," Working Papers 533, University of Milano-Bicocca, Department of Economics.
    18. Oyama, Daisuke & Tercieux, Olivier, 2009. "Iterated potential and robustness of equilibria," Journal of Economic Theory, Elsevier, vol. 144(4), pages 1726-1769, July.
    19. Csóka, Péter & Illés, Ferenc & Solymosi, Tamás, 2022. "On the Shapley value of liability games," European Journal of Operational Research, Elsevier, vol. 300(1), pages 378-386.
    20. Kojima, Fuhito & Takahashi, Satoru, 2008. "p-Dominance and perfect foresight dynamics," Journal of Economic Behavior & Organization, Elsevier, vol. 67(3-4), pages 689-701, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2106.12928. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.