IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2410.14059.html
   My bibliography  Save this paper

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Author

Listed:
  • Yuzhe Yang
  • Yifei Zhang
  • Yan Hu
  • Yilin Guo
  • Ruoli Gan
  • Yueru He
  • Mingcong Lei
  • Xiao Zhang
  • Haining Wang
  • Qianqian Xie
  • Jimin Huang
  • Honghai Yu
  • Benyou Wang

Abstract

This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 12 LLM services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial sector but also provides a robust framework for assessing their performance and user satisfaction. The benchmark dataset and evaluation code are available.

Suggested Citation

  • Yuzhe Yang & Yifei Zhang & Yan Hu & Yilin Guo & Ruoli Gan & Yueru He & Mingcong Lei & Xiao Zhang & Haining Wang & Qianqian Xie & Jimin Huang & Honghai Yu & Benyou Wang, 2024. "UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models," Papers 2410.14059, arXiv.org, revised Oct 2024.
  • Handle: RePEc:arx:papers:2410.14059
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2410.14059
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Christopher Wimmer & Navid Rekabsaz, 2023. "Leveraging Vision-Language Models for Granular Market Change Prediction," Papers 2301.10166, arXiv.org.
    2. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    3. Hanshuang Tong & Jun Li & Ning Wu & Ming Gong & Dongmei Zhang & Qi Zhang, 2024. "Ploutos: Towards interpretable stock movement prediction with financial large language model," Papers 2403.00782, arXiv.org.
    4. Pagano, Marco, 1993. "Financial markets and growth: An overview," European Economic Review, Elsevier, vol. 37(2-3), pages 613-622, April.
    5. Heba Soltan Mohamed & Gauss M. Cordeiro & R. Minkah & Haitham M. Yousof & Mohamed Ibrahim, 2024. "A size-of-loss model for the negatively skewed insurance claims data: applications, risk analysis using different methods and statistical forecasting," Journal of Applied Statistics, Taylor & Francis Journals, vol. 51(2), pages 348-369, January.
    6. Richard H. Thaler, 2008. "Mental Accounting and Consumer Choice," Marketing Science, INFORMS, vol. 27(1), pages 15-25, 01-02.
    7. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    8. Alex Kim & Maximilian Muhn & Valeri Nikolaev, 2024. "Financial Statement Analysis with Large Language Models," Papers 2407.17866, arXiv.org, revised Nov 2024.
    9. Hongyang Yang & Xiao-Yang Liu & Christina Dan Wang, 2023. "FinGPT: Open-Source Financial Large Language Models," Papers 2306.06031, arXiv.org.
    10. Richard H. Thaler, 2008. "Commentary—Mental Accounting and Consumer Choice: Anatomy of a Failure," Marketing Science, INFORMS, vol. 27(1), pages 12-14, 01-02.
    11. Kelvin J. L. Koa & Yunshan Ma & Ritchie Ng & Tat-Seng Chua, 2024. "Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models," Papers 2402.03659, arXiv.org, revised Feb 2024.
    12. Qianqian Xie & Dong Li & Mengxi Xiao & Zihao Jiang & Ruoyu Xiang & Xiao Zhang & Zhengyu Chen & Yueru He & Weiguang Han & Yuzhe Yang & Shunian Chen & Yifei Zhang & Lihang Shen & Daniel Kim & Zhiwei Liu, 2024. "Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications," Papers 2408.11878, arXiv.org.
    13. Hvattum, Lars Magnus & Arntzen, Halvard, 2010. "Using ELO ratings for match result prediction in association football," International Journal of Forecasting, Elsevier, vol. 26(3), pages 460-470, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hongjuan Song & Yushi Jiang, 2019. "Dynamic pricing decisions by potential tourists under uncertainty: The effects of tourism advertising," Tourism Economics, , vol. 25(2), pages 213-234, March.
    2. Xuewen Han & Neng Wang & Shangkun Che & Hongyang Yang & Kunpeng Zhang & Sean Xin Xu, 2024. "Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research," Papers 2411.04788, arXiv.org.
    3. Tarnanidis, Theodore & Owusu-Frimpong, Nana & Nwankwo, Sonny & Omar, Maktoba, 2015. "Why we buy? Modeling consumer selection of referents," Journal of Retailing and Consumer Services, Elsevier, vol. 22(C), pages 24-36.
    4. Huang, Jiaqi & Antonides, Gerrit & Nie, Fengying, 2020. "Is mental accounting of farm produce associated with more consumption of own-produced food?," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 88(C).
    5. Mortimer, Gary & Weeks, Clinton S., 2019. "How unit price awareness and usage encourages grocery brand switching and expenditure," Journal of Retailing and Consumer Services, Elsevier, vol. 49(C), pages 346-356.
    6. Wu, Bao & Jin, Chenfei & Monfort, Abel & Hua, Danni, 2021. "Generous charity to preserve green image? Exploring linkage between strategic donations and environmental misconduct," Journal of Business Research, Elsevier, vol. 131(C), pages 839-850.
    7. John Cawley & Alex Susskind & Barton Willage, 2020. "The Impact of Information Disclosure on Consumer Behavior: Evidence from a Randomized Field Experiment of Calorie Labels on Restaurant Menus," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 39(4), pages 1020-1042, September.
    8. Pico Bonilla, Claudia Milena & Sandoval Garrido, Luis Eduardo, 2024. "Intertemporal consumption and lifecycle in a pandemic context: an experimental approximation," Revista Tendencias, Universidad de Narino, vol. 25(2), pages 57-85, July.
    9. J. James Reade & Jan C. van Ours, 2024. "Consumer perceptions matter: A case study of an anomaly in English football," Tinbergen Institute Discussion Papers 24-023/V, Tinbergen Institute.
    10. Martín-Herrán, Guiomar & Sigué, Simon-Pierre & Zaccour, Georges, 2010. "The Dilemma of Pull and Push-Price Promotions," Journal of Retailing, Elsevier, vol. 86(1), pages 51-68.
    11. Tanaka, Takuro & Mizutani, Fumitoshi, 2023. "Determinants of the adoption of energy efficient water heaters in the residential sector: Evidence from a survey in Japan," Energy Policy, Elsevier, vol. 180(C).
    12. Alharthi, Amal & Cortese, Corinne & Moerman, Lee & Tanima, Farzana, 2022. "Surveillance capitalism in the middle east retail sector," CRITICAL PERSPECTIVES ON ACCOUNTING, Elsevier, vol. 87(C).
    13. Jake An & Donnel Briley & Shai Danziger & Shai Levi, 2023. "The Impact of Social Investing on Charitable Donations," Management Science, INFORMS, vol. 69(2), pages 1264-1274, February.
    14. Ahrens, Steffen & Pirschel, Inske & Snower, Dennis J., 2017. "A theory of price adjustment under loss aversion," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 78-95.
    15. Bulut, Harun, 2016. "U.S. Farmers’ Insurance Choices under Expected Utility Theory and Cumulative Prospect Theory," 2016 Annual Meeting, July 31-August 2, Boston, Massachusetts 236019, Agricultural and Applied Economics Association.
    16. Hachicha, Fatma & Argoubi, Majdi & Guesmi, Khaled, 2024. "The knowledge domain and emerging trends in Behavioral Finance: A Scientometric Analysis," Research in International Business and Finance, Elsevier, vol. 70(PB).
    17. Bhanot, Syon P. & Han, Jiyoung & Jang, Chaning, 2018. "Workfare, wellbeing and consumption: Evidence from a field experiment with Kenya’s urban poor," Journal of Economic Behavior & Organization, Elsevier, vol. 149(C), pages 372-388.
    18. Fabrizi, Simona & Lippert, Steffen & Puppe, Clemens & Rosenkranz, Stephanie, 2016. "Manufacturer suggested retail prices, loss aversion and competition," Journal of Economic Psychology, Elsevier, vol. 53(C), pages 141-153.
    19. Sunardi Sunardi & Theresia Woro Damayanti & Supramono Supramono, 2020. "Men, Money and Household Economy: How Behavioral Approach Explain It," Romanian Economic Journal, Department of International Business and Economics from the Academy of Economic Studies Bucharest, vol. 23(77), pages 50-63, September.
    20. Bulut, Harun, 2017. "Managing Catastrophic Risk in Agriculture through Ex Ante Subsidized Insurance or Ex Post Disaster Aid," Journal of Agricultural and Resource Economics, Western Agricultural Economics Association, vol. 42(3), September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2410.14059. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.