IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2402.06698.html
   My bibliography  Save this paper

FNSPID: A Comprehensive Financial News Dataset in Time Series

Author

Listed:
  • Zihan Dong
  • Xinyu Fan
  • Zhiyuan Peng

Abstract

Financial market predictions utilize historical data to anticipate future stock prices and market trends. Traditionally, these predictions have focused on the statistical analysis of quantitative factors, such as stock prices, trading volumes, inflation rates, and changes in industrial production. Recent advancements in large language models motivate the integrated financial analysis of both sentiment data, particularly market news, and numerical factors. Nonetheless, this methodology frequently encounters constraints due to the paucity of extensive datasets that amalgamate both quantitative and qualitative sentiment analyses. To address this challenge, we introduce a large-scale financial dataset, namely, Financial News and Stock Price Integration Dataset (FNSPID). It comprises 29.7 million stock prices and 15.7 million time-aligned financial news records for 4,775 S&P500 companies, covering the period from 1999 to 2023, sourced from 4 stock market news websites. We demonstrate that FNSPID excels existing stock market datasets in scale and diversity while uniquely incorporating sentiment information. Through financial analysis experiments on FNSPID, we propose: (1) the dataset's size and quality significantly boost market prediction accuracy; (2) adding sentiment scores modestly enhances performance on the transformer-based model; (3) a reproducible procedure that can update the dataset. Completed work, code, documentation, and examples are available at github.com/Zdong104/FNSPID. FNSPID offers unprecedented opportunities for the financial research community to advance predictive modeling and analysis.

Suggested Citation

  • Zihan Dong & Xinyu Fan & Zhiyuan Peng, 2024. "FNSPID: A Comprehensive Financial News Dataset in Time Series," Papers 2402.06698, arXiv.org.
  • Handle: RePEc:arx:papers:2402.06698
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2402.06698
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dhruhi Sheth & Manan Shah, 2023. "Predicting stock market using machine learning: best and accurate way to know future stock prices," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 14(1), pages 1-18, February.
    2. Fama, Eugene F & French, Kenneth R, 1992. "The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    3. Boyu Zhang & Hongyang Yang & Xiao-Yang Liu, 2023. "Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models," Papers 2306.12659, arXiv.org.
    4. Yen-Ju Hsu & Yang-Cheng Lu & J. Jimmy Yang, 2021. "News sentiment and stock market volatility," Review of Quantitative Finance and Accounting, Springer, vol. 57(3), pages 1093-1122, October.
    5. Sakariyahu, Rilwan & Johan, Sofia & Lawal, Rodiat & Paterson, Audrey & Chatzivgeri, Eleni, 2023. "Dynamic connectedness between investors’ sentiment and asset prices: A comparison between major markets in Europe and USA," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 89(C).
    6. Alejandro Lopez-Lira & Yuehua Tang, 2023. "Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models," Papers 2304.07619, arXiv.org, revised Sep 2024.
    7. William F. Sharpe, 1964. "Capital Asset Prices: A Theory Of Market Equilibrium Under Conditions Of Risk," Journal of Finance, American Finance Association, vol. 19(3), pages 425-442, September.
    8. Chen, Nai-fu, 1983. "Some Empirical Tests of the Theory of Arbitrage Pricing," Journal of Finance, American Finance Association, vol. 38(5), pages 1393-1414, December.
    9. David E. Allen & Michael McAleer & Abhay K. Singh, 2019. "Daily market news sentiment and stock prices," Applied Economics, Taylor & Francis Journals, vol. 51(30), pages 3212-3235, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Attiya Yasmeen Javid, 2000. "Alternative Capital Asset Pricing Models: A Review of Theory and Evidence," PIDE Research Report 2000:3, Pakistan Institute of Development Economics.
    2. Trabelsi, Mohamed Ali, 2010. "Choix de portefeuille: comparaison des différentes stratégies [Portfolio selection: comparison of different strategies]," MPRA Paper 82946, University Library of Munich, Germany, revised 01 Dec 2010.
    3. Erdinc Altay, 2003. "The Effect of Macroeconomic Factors on Asset Returns: A Comparative Analysis of the German and the Turkish Stock Markets in an APT Framework," Finance 0307006, University Library of Munich, Germany.
    4. Fernando Rubio, 2005. "Eficiencia De Mercado, Administracion De Carteras De Fondos Y Behavioural Finance," Finance 0503028, University Library of Munich, Germany, revised 23 Jul 2005.
    5. Su-Jane Chen & Chengho Hsieh & Timothy W. Vines & Shur-Nuaan Chiou, 1998. "Macroeconomic Variables, Firm-Specific Variables and Returns to REITs," Journal of Real Estate Research, American Real Estate Society, vol. 16(3), pages 269-278.
    6. Attiya Y. Javed, 2000. "Alternative Capital Asset Pricing Models: A Review of Theory and Evidence," PIDE-Working Papers 2000:179, Pakistan Institute of Development Economics.
    7. Ho, Ron Yiu-wah & Strange, Roger & Piesse, Jenifer, 2006. "On the conditional pricing effects of beta, size, and book-to-market equity in the Hong Kong market," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 16(3), pages 199-214, July.
    8. Constantinos Antoniou & John A. Doukas & Avanidhar Subrahmanyam, 2016. "Investor Sentiment, Beta, and the Cost of Equity Capital," Management Science, INFORMS, vol. 62(2), pages 347-367, February.
    9. Radosław Kurach, 2013. "Does Beta Explain Global Equity Market Volatility – Some Empirical Evidence," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 7(2), June.
    10. Shi, Yun & Cui, Xiangyu & Zhou, Xunyu, 2020. "Beta and Coskewness Pricing: Perspective from Probability Weighting," SocArXiv 5rqhv, Center for Open Science.
    11. Abugri, Benjamin A. & Dutta, Sandip, 2014. "Are we overestimating REIT idiosyncratic risk? Analysis of pricing effects and persistence," International Review of Economics & Finance, Elsevier, vol. 29(C), pages 249-259.
    12. David E. Allen & Michael McAleer & Abhay K. Singh, 2019. "Daily market news sentiment and stock prices," Applied Economics, Taylor & Francis Journals, vol. 51(30), pages 3212-3235, June.
    13. Flouris, Triant & Walker, Thomas, 2005. "Financial Comparisons Across Different Business Models in the Canadian Airline Industry," 46th Annual Transportation Research Forum, Washington, D.C., March 6-8, 2005 208157, Transportation Research Forum.
    14. Agiakloglou, Christos & Gkouvakis, Michail, 2015. "Causal interrelations among market fundamentals: Evidence from the European Telecommunications sector," The Quarterly Review of Economics and Finance, Elsevier, vol. 55(C), pages 150-159.
    15. Eero Pätäri & Timo Leivo, 2017. "A Closer Look At Value Premium: Literature Review And Synthesis," Journal of Economic Surveys, Wiley Blackwell, vol. 31(1), pages 79-168, February.
    16. Shaikh, Salman, 2013. "Investment Decisions by Analysts: A Case Study of KSE," MPRA Paper 53802, University Library of Munich, Germany.
    17. Chongsoo An & John J. Cheh & Il-woon Kim, 2017. "Do Value Stocks Outperform Growth Stocks in the U.S. Stock Market?," Journal of Applied Finance & Banking, SCIENPRESS Ltd, vol. 7(2), pages 1-7.
    18. Turan G. Bali & Robert F. Engle & Yi Tang, 2017. "Dynamic Conditional Beta Is Alive and Well in the Cross Section of Daily Stock Returns," Management Science, INFORMS, vol. 63(11), pages 3760-3779, November.
    19. Cakici, Nusret & Zaremba, Adam, 2022. "Salience theory and the cross-section of stock returns: International and further evidence," Journal of Financial Economics, Elsevier, vol. 146(2), pages 689-725.
    20. Michael E. Drew & Jon D. Stanford, 2003. "Retail Superannuation Management in Australia: Risk, Cost and Alpha," School of Economics and Finance Discussion Papers and Working Papers Series 126, School of Economics and Finance, Queensland University of Technology.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2402.06698. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.