IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2307.11137.html
   My bibliography  Save this paper

Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models

Author

Listed:
  • Steve Phelps
  • Rebecca Ranson

Abstract

AI Alignment is often presented as an interaction between a single designer and an artificial agent in which the designer attempts to ensure the agent's behavior is consistent with its purpose, and risks arise solely because of conflicts caused by inadvertent misalignment between the utility function intended by the designer and the resulting internal utility function of the agent. With the advent of agents instantiated with large-language models (LLMs), which are typically pre-trained, we argue this does not capture the essential aspects of AI safety because in the real world there is not a one-to-one correspondence between designer and agent, and the many agents, both artificial and human, have heterogeneous values. Therefore, there is an economic aspect to AI safety and the principal-agent problem is likely to arise. In a principal-agent problem conflict arises because of information asymmetry together with inherent misalignment between the utility of the agent and its principal, and this inherent misalignment cannot be overcome by coercing the agent into adopting a desired utility function through training. We argue the assumptions underlying principal-agent problems are crucial to capturing the essence of safety problems involving pre-trained AI models in real-world situations. Taking an empirical approach to AI safety, we investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task, showing clear evidence of principal-agent conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced behaviour in response to changes in information asymmetry, whereas the later GPT-4 model is more rigid in adhering to its prior alignment. Our results highlight the importance of incorporating principles from economics into the alignment process.

Suggested Citation

  • Steve Phelps & Rebecca Ranson, 2023. "Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models," Papers 2307.11137, arXiv.org, revised Sep 2023.
  • Handle: RePEc:arx:papers:2307.11137
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2307.11137
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jensen, Michael C. & Meckling, William H., 1976. "Theory of the firm: Managerial behavior, agency costs and ownership structure," Journal of Financial Economics, Elsevier, vol. 3(4), pages 305-360, October.
    2. Jeroen Bergh & Sigrid Stagl, 2003. "Coevolution of economic behaviour and institutions: towards a theory of institutional change," Journal of Evolutionary Economics, Springer, vol. 13(3), pages 289-317, August.
    3. John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dae-Hyun Yoo & Caterina Giannetti, 2024. "A Principal-Agent Model for Ethical AI: Optimal Contracts and Incentives for Ethical Alignment," Discussion Papers 2024/313, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. van den Bergh, Jeroen C.J.M. & Gowdy, John M., 2009. "A group selection perspective on economic behavior, institutions and organizations," Journal of Economic Behavior & Organization, Elsevier, vol. 72(1), pages 1-20, October.
    2. Safarzynska, Karolina & van den Bergh, Jeroen C.J.M., 2010. "Evolving power and environmental policy: Explaining institutional change with group selection," Ecological Economics, Elsevier, vol. 69(4), pages 743-752, February.
    3. Whittaker, Julie, 2011. "The evolution of environmentally responsible investment: An Adam Smith perspective," Ecological Economics, Elsevier, vol. 71(C), pages 33-41.
    4. Barbara Su, 2023. "Banking practices and borrowing firms’ financial reporting quality: evidence from bank cross-selling," Review of Accounting Studies, Springer, vol. 28(1), pages 201-236, March.
    5. Yeon‐Koo Che & Kathryn E. Spier, 2008. "Strategic judgment proofing," RAND Journal of Economics, RAND Corporation, vol. 39(4), pages 926-948, December.
    6. Klapper, Leora F. & Love, Inessa, 2004. "Corporate governance, investor protection, and performance in emerging markets," Journal of Corporate Finance, Elsevier, vol. 10(5), pages 703-728, November.
    7. Hartarska, Valentina M. & Nadolnyak, Denis A., 2012. "Financing Constraints and Access to Credit in Post Crisis Environment: Evidence from New Farmers in Alabama," 2012 Annual Meeting, August 12-14, 2012, Seattle, Washington 124882, Agricultural and Applied Economics Association.
    8. Hasan, Iftekhar & Lozano-Vivas, Ana, 2002. "Organizational Form and Expense Preference: Spanish Experience," Bulletin of Economic Research, Wiley Blackwell, vol. 54(2), pages 135-150, April.
    9. Fabbri, Daniela & Menichini, Anna Maria C., 2016. "The commitment problem of secured lending," Journal of Financial Economics, Elsevier, vol. 120(3), pages 561-584.
    10. Sang Cheol Lee & Mooweon Rhee & Jongchul Yoon, 2018. "Foreign Monitoring and Audit Quality: Evidence from Korea," Sustainability, MDPI, vol. 10(9), pages 1-22, September.
    11. Lu, Yao & Zhan, Shuwei & Zhan, Minghua, 2024. "Has FinTech changed the sensitivity of corporate investment to interest rates?—Evidence from China," Research in International Business and Finance, Elsevier, vol. 68(C).
    12. DEGEORGE, François & DING, Yuan & JEANJEAN, Thomas & STOLOWY, Hervé, 2005. "Does Analyst Following Curb Earnings Management?," HEC Research Papers Series 810, HEC Paris.
    13. Xueyan Dong & Jingyu Gao & Sunny Li Sun & Kangtao Ye, 2021. "Doing extreme by doing good," Asia Pacific Journal of Management, Springer, vol. 38(1), pages 291-315, March.
    14. Gerry Gallery & Emerson Cooper & John Sweeting, 2008. "Corporate Disclosure Quality: Lessons from Australian Companies on the Impact of Adopting International Financial Reporting Standards," Australian Accounting Review, CPA Australia, vol. 18(3), pages 257-273, September.
    15. Baarda, James R., 2003. "Current Law & Economics Debates: Tools for Assessing Fundamental Cooperative Changes?," 2003 Annual Meeting, October 29 31802, NCERA-194 Research on Cooperatives.
    16. Khémiri, Wafa & Noubbigh, Hédi, 2020. "Size-threshold effect in debt-firm performance nexus in the sub-Saharan region: A Panel Smooth Transition Regression approach," The Quarterly Review of Economics and Finance, Elsevier, vol. 76(C), pages 335-344.
    17. Shaikh, Ibrahim A. & O'Brien, Jonathan Paul & Peters, Lois, 2018. "Inside directors and the underinvestment of financial slack towards R&D-intensity in high-technology firms," Journal of Business Research, Elsevier, vol. 82(C), pages 192-201.
    18. Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Renumeration Seniority," Discussion Paper 2004-120, Tilburg University, Center for Economic Research.
    19. Maha Faisal Alsayegh & Rashidah Abdul Rahman & Saeid Homayoun, 2020. "Corporate Economic, Environmental, and Social Sustainability Performance Transformation through ESG Disclosure," Sustainability, MDPI, vol. 12(9), pages 1-20, May.
    20. Preet Singh & Chitra Singla, 2016. "Executive Stock Options: Will it Work as a Good Governance Mechanism in all Scenarios?," Working Papers id:10985, eSocialSciences.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2307.11137. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.