IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2307.10485.html
   My bibliography  Save this paper

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

Author

Listed:
  • Xiao-Yang Liu
  • Guoxuan Wang
  • Hongyang Yang
  • Daochen Zha

Abstract

Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available, and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, Financial Generative Pre-trained Transformer (FinGPT), that automates the collection and curation of real-time financial data from 34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes have been open-sourced.

Suggested Citation

  • Xiao-Yang Liu & Guoxuan Wang & Hongyang Yang & Daochen Zha, 2023. "FinGPT: Democratizing Internet-scale Data for Financial Large Language Models," Papers 2307.10485, arXiv.org, revised Nov 2023.
  • Handle: RePEc:arx:papers:2307.10485
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2307.10485
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Boyu Zhang & Hongyang Yang & Xiao-Yang Liu, 2023. "Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models," Papers 2306.12659, arXiv.org.
    2. David Byrd & Antigoni Polychroniadou, 2020. "Differentially Private Secure Multi-Party Computation for Federated Learning in Financial Applications," Papers 2010.05867, arXiv.org.
    3. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    4. Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
    5. Xiao-Yang Liu & Ziyi Xia & Jingyang Rui & Jiechao Gao & Hongyang Yang & Ming Zhu & Christina Dan Wang & Zhaoran Wang & Jian Guo, 2022. "FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning," Papers 2211.03107, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuan Li & Bingqiao Luo & Qian Wang & Nuo Chen & Xu Liu & Bingsheng He, 2024. "A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading," Papers 2407.09546, arXiv.org.
    2. David Kuo Chuen Lee & Chong Guan & Yinghui Yu & Qinxu Ding, 2024. "A Comprehensive Review of Generative AI in Finance," FinTech, MDPI, vol. 3(3), pages 1-19, September.
    3. Masanori Hirano & Kentaro Imajo, 2024. "The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging," Papers 2409.19854, arXiv.org.
    4. Saber Talazadeh & Dragan Perakovic, 2024. "SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest," Papers 2410.07143, arXiv.org.
    5. Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thanos Konstantinidis & Giorgos Iacovides & Mingxue Xu & Tony G. Constantinides & Danilo Mandic, 2024. "FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications," Papers 2403.12285, arXiv.org.
    2. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    3. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
    4. Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
    5. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Nov 2024.
    6. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    7. Travis Adams & Andrea Ajello & Diego Silva & Francisco Vazquez-Grande, 2023. "More than Words: Twitter Chatter and Financial Market Sentiment," Papers 2305.16164, arXiv.org.
    8. Chandan Singh & Armin Askari & Rich Caruana & Jianfeng Gao, 2023. "Augmenting interpretable models with large language models during training," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    9. Borchert, Philipp & Coussement, Kristof & De Weerdt, Jochen & De Caigny, Arno, 2024. "Industry-sensitive language modeling for business," European Journal of Operational Research, Elsevier, vol. 315(2), pages 691-702.
    10. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    11. Priyank Sonkiya & Vikas Bajpai & Anukriti Bansal, 2021. "Stock price prediction using BERT and GAN," Papers 2107.09055, arXiv.org.
    12. Longbing Cao, 2021. "AI in Finance: Challenges, Techniques and Opportunities," Papers 2107.09051, arXiv.org.
    13. Massimo Ferrari Minesso & Laura Lebastard & Helena Mezo, 2023. "Text-Based Recession Probabilities," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 71(2), pages 415-438, June.
    14. Duygu Ider & Stefan Lessmann, 2022. "Forecasting Cryptocurrency Returns from Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision," Papers 2204.05781, arXiv.org, revised Mar 2023.
    15. Moritz Scherrmann, 2023. "Multi-Label Topic Model for Financial Textual Data," Papers 2311.07598, arXiv.org.
    16. Ge, S., 2020. "Text-Based Linkages and Local Risk Spillovers in the Equity Market," Cambridge Working Papers in Economics 20115, Faculty of Economics, University of Cambridge.
    17. Bledar Fazlija & Pedro Harder, 2022. "Using Financial News Sentiment for Stock Price Direction Prediction," Mathematics, MDPI, vol. 10(13), pages 1-20, June.
    18. David M. Goldberg & Nohel Zaman & Arin Brahma & Mariano Aloiso, 2022. "Are mortgage loan closing delay risks predictable? A predictive analysis using text mining on discussion threads," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(3), pages 419-437, March.
    19. Massimo Ferrari Minesso & Frederik Kurcz & Maria Sole Pagliari, 2022. "Do words hurt more than actions? The impact of trade tensions on financial markets," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1138-1159, September.
    20. Alonso-Robisco, Andres & Carbó, José Manuel, 2023. "Analysis of CBDC narrative by central banks using large language models," Finance Research Letters, Elsevier, vol. 58(PC).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2307.10485. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.