IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2410.19599.html
   My bibliography  Save this paper

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina

Author

Listed:
  • Yuan Gao
  • Dokyun Lee
  • Gordon Burtch
  • Sina Fazelpour

Abstract

Recent studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates or simulations for humans in social science research. However, LLMs differ fundamentally from humans, relying on probabilistic patterns, absent the embodied experiences or survival objectives that shape human cognition. We assess the reasoning depth of LLMs using the 11-20 money request game. Nearly all advanced approaches fail to replicate human behavior distributions across many models. Causes of failure are diverse and unpredictable, relating to input language, roles, and safeguarding. These results advise caution when using LLMs to study human behavior or as surrogates or simulations.

Suggested Citation

  • Yuan Gao & Dokyun Lee & Gordon Burtch & Sina Fazelpour, 2024. "Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina," Papers 2410.19599, arXiv.org, revised Nov 2024.
  • Handle: RePEc:arx:papers:2410.19599
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2410.19599
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Anthony Chemero, 2023. "LLMs differ from human cognition because they are not embodied," Nature Human Behaviour, Nature, vol. 7(11), pages 1828-1829, November.
    2. Siting Estee Lu, 2024. "Strategic Interactions between Large Language Models-based Agents in Beauty Contests," Papers 2404.08492, arXiv.org, revised Oct 2024.
    3. Ali Goli & Amandeep Singh, 2024. "Frontiers: Can Large Language Models Capture Human Preferences?," Marketing Science, INFORMS, vol. 43(4), pages 709-722, July.
    4. John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," NBER Working Papers 31122, National Bureau of Economic Research, Inc.
    5. John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org.
    6. Peiyao Li & Noah Castelo & Zsolt Katona & Miklos Sarvary, 2024. "Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis," Marketing Science, INFORMS, vol. 43(2), pages 254-266, March.
    7. Matthew Hutson, 2024. "How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models," Nature, Nature, vol. 629(8014), pages 986-988, May.
    8. Avi Goldfarb & Mo Xiao, 2011. "Who Thinks about the Competition? Managerial Ability and Strategic Entry in US Local Telephone Markets," American Economic Review, American Economic Association, vol. 101(7), pages 3130-3161, December.
    9. James W. A. Strachan & Dalila Albergo & Giulia Borghini & Oriana Pansardi & Eugenio Scaliti & Saurabh Gupta & Krati Saxena & Alessandro Rufo & Stefano Panzeri & Guido Manzi & Michael S. A. Graziano & , 2024. "Testing theory of mind in large language models and humans," Nature Human Behaviour, Nature, vol. 8(7), pages 1285-1295, July.
    10. Daniel Zizzo, 2010. "Experimenter demand effects in economic experiments," Experimental Economics, Springer;Economic Science Association, vol. 13(1), pages 75-98, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniel Albert & Stephan Billinger, 2024. "Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models," Papers 2410.06932, arXiv.org.
    2. Kevin Leyton-Brown & Paul Milgrom & Neil Newman & Ilya Segal, 2024. "Artificial Intelligence and Market Design: Lessons Learned from Radio Spectrum Reallocation," NBER Chapters, in: New Directions in Market Design, National Bureau of Economic Research, Inc.
    3. Kirshner, Samuel N., 2024. "GPT and CLT: The impact of ChatGPT's level of abstraction on consumer recommendations," Journal of Retailing and Consumer Services, Elsevier, vol. 76(C).
    4. Zengqing Wu & Run Peng & Xu Han & Shuyuan Zheng & Yixin Zhang & Chuan Xiao, 2023. "Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations," Papers 2311.06330, arXiv.org, revised Dec 2023.
    5. Joshua C. Yang & Damian Dailisan & Marcin Korecki & Carina I. Hausladen & Dirk Helbing, 2024. "LLM Voting: Human Choices and AI Collective Decision Making," Papers 2402.01766, arXiv.org, revised Aug 2024.
    6. Nir Chemaya & Daniel Martin, 2023. "Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals," Papers 2311.14720, arXiv.org, revised Jan 2024.
    7. Lijia Ma & Xingchen Xu & Yong Tan, 2024. "Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines," Papers 2402.19421, arXiv.org.
    8. Ali Goli & Amandeep Singh, 2023. "Exploring the Influence of Language on Time-Reward Perceptions in Large Language Models: A Study Using GPT-3.5," Papers 2305.02531, arXiv.org, revised Jun 2023.
    9. Evangelos Katsamakas, 2024. "Business models for the simulation hypothesis," Papers 2404.08991, arXiv.org.
    10. Christoph Engel & Max R. P. Grossmann & Axel Ockenfels, 2023. "Integrating machine behavior into human subject experiments: A user-friendly toolkit and illustrations," Discussion Paper Series of the Max Planck Institute for Research on Collective Goods 2024_01, Max Planck Institute for Research on Collective Goods.
    11. Yiting Chen & Tracy Xiao Liu & You Shan & Songfa Zhong, 2023. "The emergence of economic rationality of GPT," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(51), pages 2316205120-, December.
    12. Samuel Chang & Andrew Kennedy & Aaron Leonard & John A. List, 2024. "12 Best Practices for Leveraging Generative AI in Experimental Research," NBER Working Papers 33025, National Bureau of Economic Research, Inc.
    13. Jiafu An & Difang Huang & Chen Lin & Mingzhu Tai, 2024. "Measuring Gender and Racial Biases in Large Language Models," Papers 2403.15281, arXiv.org.
    14. Fulin Guo, 2023. "GPT in Game Theory Experiments," Papers 2305.05516, arXiv.org, revised Dec 2023.
    15. Jingru Jia & Zehua Yuan & Junhao Pan & Paul E. McNamara & Deming Chen, 2024. "Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context," Papers 2406.05972, arXiv.org, revised Oct 2024.
    16. Fabio Motoki & Valdemar Pinho Neto & Victor Rodrigues, 2024. "More human than human: measuring ChatGPT political bias," Public Choice, Springer, vol. 198(1), pages 3-23, January.
    17. George Gui & Olivier Toubia, 2023. "The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective," Papers 2312.15524, arXiv.org.
    18. Felix Chopra & Ingar Haaland, 2023. "Conducting qualitative interviews with AI," CEBI working paper series 23-06, University of Copenhagen. Department of Economics. The Center for Economic Behavior and Inequality (CEBI).
    19. Siting Estee Lu, 2024. "Strategic Interactions between Large Language Models-based Agents in Beauty Contests," Papers 2404.08492, arXiv.org, revised Oct 2024.
    20. Shumiao Ouyang & Hayong Yun & Xingjian Zheng, 2024. "How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs," Papers 2406.01168, arXiv.org, revised Aug 2024.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2410.19599. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.