Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models

My bibliography Save this paper

Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models

Author

Listed:

Steve Phelps
Rebecca Ranson

Registered:

Abstract

AI Alignment is often presented as an interaction between a single designer and an artificial agent in which the designer attempts to ensure the agent's behavior is consistent with its purpose, and risks arise solely because of conflicts caused by inadvertent misalignment between the utility function intended by the designer and the resulting internal utility function of the agent. With the advent of agents instantiated with large-language models (LLMs), which are typically pre-trained, we argue this does not capture the essential aspects of AI safety because in the real world there is not a one-to-one correspondence between designer and agent, and the many agents, both artificial and human, have heterogeneous values. Therefore, there is an economic aspect to AI safety and the principal-agent problem is likely to arise. In a principal-agent problem conflict arises because of information asymmetry together with inherent misalignment between the utility of the agent and its principal, and this inherent misalignment cannot be overcome by coercing the agent into adopting a desired utility function through training. We argue the assumptions underlying principal-agent problems are crucial to capturing the essence of safety problems involving pre-trained AI models in real-world situations. Taking an empirical approach to AI safety, we investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task, showing clear evidence of principal-agent conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced behaviour in response to changes in information asymmetry, whereas the later GPT-4 model is more rigid in adhering to its prior alignment. Our results highlight the importance of incorporating principles from economics into the alignment process.

Suggested Citation

Steve Phelps & Rebecca Ranson, 2023. "Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models," Papers 2307.11137, arXiv.org, revised Sep 2023.

Handle: RePEc:arx:papers:2307.11137

Download full text from publisher

References listed on IDEAS

Jensen, Michael C. & Meckling, William H., 1976. "Theory of the firm: Managerial behavior, agency costs and ownership structure," Journal of Financial Economics, Elsevier, vol. 3(4), pages 305-360, October.
Jeroen Bergh & Sigrid Stagl, 2003. "Coevolution of economic behaviour and institutions: towards a theory of institutional change," Journal of Evolutionary Economics, Springer, vol. 13(3), pages 289-317, August.
John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Dae-Hyun Yoo & Caterina Giannetti, 2024. "A Principal-Agent Model for Ethical AI: Optimal Contracts and Incentives for Ethical Alignment," Discussion Papers 2024/313, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

van den Bergh, Jeroen C.J.M. & Gowdy, John M., 2009. "A group selection perspective on economic behavior, institutions and organizations," Journal of Economic Behavior & Organization, Elsevier, vol. 72(1), pages 1-20, October.
- Jeroen C.J.M. van den Bergh & John M. Gowdy, 2009. "A Group Selection Perspective on Economic Behavior, Institutions and Organizations," Post-Print hal-00695532, HAL.
Safarzynska, Karolina & van den Bergh, Jeroen C.J.M., 2010. "Evolving power and environmental policy: Explaining institutional change with group selection," Ecological Economics, Elsevier, vol. 69(4), pages 743-752, February.
Whittaker, Julie, 2011. "The evolution of environmentally responsible investment: An Adam Smith perspective," Ecological Economics, Elsevier, vol. 71(C), pages 33-41.
Mohammed T. Abusharbeh, 2024. "Technology-Profitability Paradox in Banking Sector: Evidence from Palestine," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 15(3), pages 14855-14873, September.
Barbara Su, 2023. "Banking practices and borrowing firms’ financial reporting quality: evidence from bank cross-selling," Review of Accounting Studies, Springer, vol. 28(1), pages 201-236, March.
Yeon‐Koo Che & Kathryn E. Spier, 2008. "Strategic judgment proofing," RAND Journal of Economics, RAND Corporation, vol. 39(4), pages 926-948, December.
- Che, Yeon-Koo & Spier, Kathryn, 2006. "Strategic Judgment Proofing," MPRA Paper 6100, University Library of Munich, Germany.
- Yeon-Koo Che & Kathryn E. Spier, 2008. "Strategic Judgment Proofing," NBER Working Papers 14183, National Bureau of Economic Research, Inc.
Ichev, Riste & Valentinčič, Aljoša, 2025. "The effect of impact investing on performance of private firms," Research in International Business and Finance, Elsevier, vol. 73(PA).
Klapper, Leora F. & Love, Inessa, 2004. "Corporate governance, investor protection, and performance in emerging markets," Journal of Corporate Finance, Elsevier, vol. 10(5), pages 703-728, November.
- Klapper, Leora F. & Love, Inessa, 2002. "Corporate governance, investor protection, and performance in emerging markets," Policy Research Working Paper Series 2818, The World Bank.
Hartarska, Valentina M. & Nadolnyak, Denis A., 2012. "Financing Constraints and Access to Credit in Post Crisis Environment: Evidence from New Farmers in Alabama," 2012 Annual Meeting, August 12-14, 2012, Seattle, Washington 124882, Agricultural and Applied Economics Association.
Hasan, Iftekhar & Lozano-Vivas, Ana, 2002. "Organizational Form and Expense Preference: Spanish Experience," Bulletin of Economic Research, Wiley Blackwell, vol. 54(2), pages 135-150, April.
- Iftekhar Hasan & Ana Lozano, 1999. "Organizational Form and Expense Preference: Spanish Experience," New York University, Leonard N. Stern School Finance Department Working Paper Seires 99-068, New York University, Leonard N. Stern School of Business-.
Fabbri, Daniela & Menichini, Anna Maria C., 2016. "The commitment problem of secured lending," Journal of Financial Economics, Elsevier, vol. 120(3), pages 561-584.
- Daniela Fabbri & Annamaria Menichini, 2012. "The Commitment Problem of Secured Lending," CSEF Working Papers 318, Centre for Studies in Economics and Finance (CSEF), University of Naples, Italy.
Sang Cheol Lee & Mooweon Rhee & Jongchul Yoon, 2018. "Foreign Monitoring and Audit Quality: Evidence from Korea," Sustainability, MDPI, vol. 10(9), pages 1-22, September.
Lu, Yao & Zhan, Shuwei & Zhan, Minghua, 2024. "Has FinTech changed the sensitivity of corporate investment to interest rates?—Evidence from China," Research in International Business and Finance, Elsevier, vol. 68(C).
DEGEORGE, François & DING, Yuan & JEANJEAN, Thomas & STOLOWY, Hervé, 2005. "Does Analyst Following Curb Earnings Management?," HEC Research Papers Series 810, HEC Paris.
Xueyan Dong & Jingyu Gao & Sunny Li Sun & Kangtao Ye, 2021. "Doing extreme by doing good," Asia Pacific Journal of Management, Springer, vol. 38(1), pages 291-315, March.
Gerry Gallery & Emerson Cooper & John Sweeting, 2008. "Corporate Disclosure Quality: Lessons from Australian Companies on the Impact of Adopting International Financial Reporting Standards," Australian Accounting Review, CPA Australia, vol. 18(3), pages 257-273, September.
Baarda, James R., 2003. "Current Law & Economics Debates: Tools for Assessing Fundamental Cooperative Changes?," 2003 Annual Meeting, October 29 31802, NCERA-194 Research on Cooperatives.
Khémiri, Wafa & Noubbigh, Hédi, 2020. "Size-threshold effect in debt-firm performance nexus in the sub-Saharan region: A Panel Smooth Transition Regression approach," The Quarterly Review of Economics and Finance, Elsevier, vol. 76(C), pages 335-344.
Shaikh, Ibrahim A. & O'Brien, Jonathan Paul & Peters, Lois, 2018. "Inside directors and the underinvestment of financial slack towards R&D-intensity in high-technology firms," Journal of Business Research, Elsevier, vol. 82(C), pages 192-201.
Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Renumeration Seniority," Discussion Paper 2004-120, Tilburg University, Center for Economic Research.
- Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Remuneration Seniority," Other publications TiSEM afd90cc1-f881-4875-bbcd-e, Tilburg University, School of Economics and Management.
- Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Renumeration Seniority," Other publications TiSEM 509b3b8c-a04b-42c3-8991-e, Tilburg University, School of Economics and Management.
- Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Remuneration Seniority," Discussion Paper 2004-015, Tilburg University, Tilburg Law and Economic Center.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2023-08-28 (Artificial Intelligence)
NEP-CMP-2023-08-28 (Computational Economics)
NEP-UPT-2023-08-28 (Utility Models and Prospect Theory)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2307.11137. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data