IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2405.09747.html
   My bibliography  Save this paper

NIFTY Financial News Headlines Dataset

Author

Listed:
  • Raeid Saqur
  • Ken Kato
  • Nicholas Vinden
  • Frank Rudzicz

Abstract

We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt's context length systematically) are available on Hugging Face at https://huggingface.co/datasets/raeidsaqur/NIFTY.

Suggested Citation

  • Raeid Saqur & Ken Kato & Nicholas Vinden & Frank Rudzicz, 2024. "NIFTY Financial News Headlines Dataset," Papers 2405.09747, arXiv.org.
  • Handle: RePEc:arx:papers:2405.09747
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2405.09747
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Massimo Guidolin, 2011. "Markov Switching Models in Empirical Finance," Advances in Econometrics, in: Missing Data Methods: Time-Series Methods and Applications, pages 1-86, Emerald Group Publishing Limited.
    2. Yang Li & Yangyang Yu & Haohang Li & Zhi Chen & Khaldoun Khashanah, 2023. "TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance," Papers 2309.03736, arXiv.org.
    3. Xinli Yu & Zheng Chen & Yuan Ling & Shujing Dong & Zongyi Liu & Yanbin Lu, 2023. "Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting," Papers 2306.11025, arXiv.org.
    4. Hamilton, James D., 1990. "Analysis of time series subject to changes in regime," Journal of Econometrics, Elsevier, vol. 45(1-2), pages 39-70.
    5. Haohan Zhang & Fengrui Hua & Chengjin Xu & Hao Kong & Ruiting Zuo & Jian Guo, 2023. "Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?," Papers 2306.14222, arXiv.org, revised May 2024.
    6. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Raeid Saqur & Anastasis Kratsios & Florian Krach & Yannick Limmer & Jacob-Junqi Tian & John Willes & Blanka Horvath & Frank Rudzicz, 2024. "Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models," Papers 2406.02969, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Raeid Saqur, 2024. "What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs," Papers 2406.15508, arXiv.org.
    2. Erik Kole & Dick Dijk, 2017. "How to Identify and Forecast Bull and Bear Markets?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(1), pages 120-139, January.
    3. Guidolin, Massimo & Pedio, Manuela, 2017. "Identifying and measuring the contagion channels at work in the European financial crises," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 48(C), pages 117-134.
    4. Kole, Erik & van Dijk, Dick, 2023. "Moments, shocks and spillovers in Markov-switching VAR models," Journal of Econometrics, Elsevier, vol. 236(2).
    5. Pami Dua & Divya Tuteja, 2017. "Impact Of Eurozone Sovereign Debt Crisis On China And India," The Singapore Economic Review (SER), World Scientific Publishing Co. Pte. Ltd., vol. 62(05), pages 1137-1164, December.
    6. Pami Dua & Divya Tuteja, 2016. "Contagion in International Stock and Currency Markets During Recent Crisis Episodes," Working papers 258, Centre for Development Economics, Delhi School of Economics.
    7. Pami Dua & Divya Tuteja, 2021. "Regime Shifts in the Behaviour of International Currency and Equity Markets: A Markov-Switching Analysis," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(1), pages 309-336, December.
    8. Wasim Ahmad & N. Bhanumurthy & Sanjay Sehgal, 2015. "Regime dependent dynamics and European stock markets: Is asset allocation really possible?," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 42(1), pages 77-107, February.
    9. David Kuo Chuen Lee & Chong Guan & Yinghui Yu & Qinxu Ding, 2024. "A Comprehensive Review of Generative AI in Finance," FinTech, MDPI, vol. 3(3), pages 1-19, September.
    10. Bhatia, Shipra & Tuteja, Divya, 2024. "Contagion and linkages across international currencies," International Review of Financial Analysis, Elsevier, vol. 94(C).
    11. Dua, Pami & Tuteja, Divya, 2016. "Financial crises and dynamic linkages across international stock and currency markets," Economic Modelling, Elsevier, vol. 59(C), pages 249-261.
    12. Nabil Maghrebi & Mark J. Holmes & Kosuke Oya, 2014. "Financial instability and the short-term dynamics of volatility expectations," Applied Financial Economics, Taylor & Francis Journals, vol. 24(6), pages 377-395, March.
    13. Shively, Gerald E., 2001. "Price thresholds, price volatility, and the private costs of investment in a developing country grain market," Economic Modelling, Elsevier, vol. 18(3), pages 399-414, August.
    14. Sarah Arndt & Zeno Enders, 2023. "The Transmission of Supply Shocks in Different Inflation Regimes," CESifo Working Paper Series 10839, CESifo.
    15. Michael Artis, 1999. "The UK and EMU," Palgrave Macmillan Books, in: David Cobham & George Zis (ed.), From EMS to EMU: 1979 to 1999 and Beyond, chapter 7, pages 161-180, Palgrave Macmillan.
    16. Xiang Lin & Martin Thomas Falk, 2022. "Nordic stock market performance of the travel and leisure industry during the first wave of Covid-19 pandemic," Tourism Economics, , vol. 28(5), pages 1240-1257, August.
    17. Moerman, G.A., 2001. "Unpredictable After All? A short note on exchange rate predictability," ERIM Report Series Research in Management ERS-2001-29-F&A, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    18. Peter McAdam, 2007. "USA, Japan and the Euro Area: Comparing Business-Cycle Features," International Review of Applied Economics, Taylor & Francis Journals, vol. 21(1), pages 135-156.
    19. Cavicchioli, Maddalena, 2024. "A matrix unified framework for deriving various impulse responses in Markov switching VAR: Evidence from oil and gas markets," The Journal of Economic Asymmetries, Elsevier, vol. 29(C).
    20. Franck Sédillot, 2001. "La pente des taux contient-elle de l'information sur l'activité économique future ?," Economie & Prévision, La Documentation Française, vol. 147(1), pages 141-157.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2405.09747. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.