IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2412.13013.html
   My bibliography  Save this paper

The Emergence of Strategic Reasoning of Large Language Models

Author

Listed:
  • Dongwoo Lee
  • Gavin Kader

Abstract

As Large Language Models (LLMs) are increasingly used for a variety of complex and critical tasks, it is vital to assess their logical capabilities in strategic environments. This paper examines their ability in strategic reasoning -- the process of choosing an optimal course of action by predicting and adapting to other agents' behavior. Using six LLMs, we analyze responses from play in classical games from behavioral economics (p-Beauty Contest, 11-20 Money Request Game, and Guessing Game) and evaluate their performance through hierarchical models of reasoning (level-$k$ theory and cognitive hierarchy theory). Our findings reveal that while LLMs show understanding of the games, the majority struggle with higher-order strategic reasoning. Although most LLMs did demonstrate learning ability with games involving repeated interactions, they still consistently fall short of the reasoning levels demonstrated by typical behavior from human subjects. The exception to these overall findings is with OpenAI's GPT-o1 -- specifically trained to solve complex reasoning tasks -- which consistently outperforms other LLMs and human subjects. These findings highlight the challenges and pathways in advancing LLMs toward robust strategic reasoning from the perspective of behavioral economics.

Suggested Citation

  • Dongwoo Lee & Gavin Kader, 2024. "The Emergence of Strategic Reasoning of Large Language Models," Papers 2412.13013, arXiv.org.
  • Handle: RePEc:arx:papers:2412.13013
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2412.13013
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stahl, Dale II & Wilson, Paul W., 1994. "Experimental evidence on players' models of other players," Journal of Economic Behavior & Organization, Elsevier, vol. 25(3), pages 309-327, December.
    2. Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
    3. Henning Hermes & Daniel Schunk, 2022. "If you could read my mind–an experimental beauty-contest game with children," Experimental Economics, Springer;Economic Science Association, vol. 25(1), pages 229-253, February.
    4. Vincent P. Crawford & Miguel A. Costa-Gomes, 2006. "Cognition and Behavior in Two-Person Guessing Games: An Experimental Study," American Economic Review, American Economic Association, vol. 96(5), pages 1737-1768, December.
    5. Grosskopf, Brit & Nagel, Rosemarie, 2008. "The two-person beauty contest," Games and Economic Behavior, Elsevier, vol. 62(1), pages 93-99, January.
    6. Georganas, Sotiris & Healy, Paul J. & Weber, Roberto A., 2015. "On the persistence of strategic sophistication," Journal of Economic Theory, Elsevier, vol. 159(PA), pages 369-400.
    7. Ho, Teck-Hua & Camerer, Colin & Weigelt, Keith, 1998. "Iterated Dominance and Iterated Best Response in Experimental "p-Beauty Contests."," American Economic Review, American Economic Association, vol. 88(4), pages 947-969, September.
    8. Taylor Webb & Keith J. Holyoak & Hongjing Lu, 2023. "Emergent analogical reasoning in large language models," Nature Human Behaviour, Nature, vol. 7(9), pages 1526-1541, September.
    9. Ayala Arad & Ariel Rubinstein, 2012. "The 11-20 Money Request Game: A Level-k Reasoning Study," American Economic Review, American Economic Association, vol. 102(7), pages 3561-3573, December.
    10. Nagel, Rosemarie, 1995. "Unraveling in Guessing Games: An Experimental Study," American Economic Review, American Economic Association, vol. 85(5), pages 1313-1326, December.
    11. Vincent P. Crawford & Miguel A. Costa-Gomes & Nagore Iriberri, 2013. "Structural Models of Nonequilibrium Strategic Thinking: Theory, Evidence, and Applications," Journal of Economic Literature, American Economic Association, vol. 51(1), pages 5-62, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wanqun Zhao, 2020. "Cost of Reasoning and Strategic Sophistication," Games, MDPI, vol. 11(3), pages 1-27, September.
    2. Georganas, Sotiris & Healy, Paul J. & Weber, Roberto A., 2015. "On the persistence of strategic sophistication," Journal of Economic Theory, Elsevier, vol. 159(PA), pages 369-400.
    3. Nagel, Rosemarie & Bühren, Christoph & Frank, Björn, 2017. "Inspired and inspiring: Hervé Moulin and the discovery of the beauty contest game," Mathematical Social Sciences, Elsevier, vol. 90(C), pages 191-207.
    4. Berger, Ulrich & De Silva, Hannelore & Fellner-Röhling, Gerlinde, 2016. "Cognitive hierarchies in the minimizer game," Journal of Economic Behavior & Organization, Elsevier, vol. 130(C), pages 337-348.
    5. Lindner, Florian & Sutter, Matthias, 2013. "Level-k reasoning and time pressure in the 11–20 money request game," Economics Letters, Elsevier, vol. 120(3), pages 542-545.
    6. Ye Jin, 2021. "Does level-k behavior imply level-k thinking?," Experimental Economics, Springer;Economic Science Association, vol. 24(1), pages 330-353, March.
    7. Alaoui, Larbi & Janezic, Katharina A. & Penta, Antonio, 2020. "Reasoning about others' reasoning," Journal of Economic Theory, Elsevier, vol. 189(C).
    8. Choo, Lawrence C.Y & Kaplan, Todd R., 2014. "Explaining Behavior in the "11-20" Game," MPRA Paper 52808, University Library of Munich, Germany.
    9. Bayer, Ralph C. & Renou, Ludovic, 2016. "Logical omniscience at the laboratory," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 64(C), pages 41-49.
    10. Hanaki, Nobuyuki & Koriyama, Yukio & Sutan, Angela & Willinger, Marc, 2019. "The strategic environment effect in beauty contest games," Games and Economic Behavior, Elsevier, vol. 113(C), pages 587-610.
    11. Larbi Alaoui & Antonio Penta, 2016. "Endogenous Depth of Reasoning," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(4), pages 1297-1333.
    12. Bayer, R.-C. & Renou, Ludovic, 2016. "Logical abilities and behavior in strategic-form games," Journal of Economic Psychology, Elsevier, vol. 56(C), pages 39-59.
    13. Carlos Alós-Ferrer & Johannes Buckenmaier, 2021. "Cognitive sophistication and deliberation times," Experimental Economics, Springer;Economic Science Association, vol. 24(2), pages 558-592, June.
    14. King King Li & Kang Rong, 2024. "A two-step guessing game," Theory and Decision, Springer, vol. 97(1), pages 89-108, August.
    15. Mauersberger, Felix & Nagel, Rosemarie & Bühren, Christoph, 2020. "Bounded rationality in Keynesian beauty contests: A lesson for central bankers?," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 14, pages 1-38.
    16. Allred, Sarah & Duffy, Sean & Smith, John, 2016. "Cognitive load and strategic sophistication," Journal of Economic Behavior & Organization, Elsevier, vol. 125(C), pages 162-178.
    17. María Cubel & Santiago Sanchez-Pages, 2014. "Gender differences and stereotypes in the beauty contest," Working Papers 2014/13, Institut d'Economia de Barcelona (IEB).
    18. Dimitris Batzilis & Sonia Jaffe & Steven Levitt & John A. List & Jeffrey Picel, 2019. "Behavior in Strategic Settings: Evidence from a Million Rock-Paper-Scissors Games," Games, MDPI, vol. 10(2), pages 1-34, April.
    19. Teck-Hua Ho & So-Eun Park & Xuanming Su, 2021. "A Bayesian Level- k Model in n -Person Games," Management Science, INFORMS, vol. 67(3), pages 1622-1638, March.
    20. Choo, Lawrence & Kaplan, Todd R. & Zhou, Xiaoyu, 2019. "Can auctions select people by their level-k types?," MPRA Paper 95987, University Library of Munich, Germany.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2412.13013. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.