IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2405.09747.html
   My bibliography  Save this paper

NIFTY Financial News Headlines Dataset

Author

Listed:
  • Raeid Saqur
  • Ken Kato
  • Nicholas Vinden
  • Frank Rudzicz

Abstract

We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt's context length systematically) are available on Hugging Face at https://huggingface.co/datasets/raeidsaqur/NIFTY.

Suggested Citation

  • Raeid Saqur & Ken Kato & Nicholas Vinden & Frank Rudzicz, 2024. "NIFTY Financial News Headlines Dataset," Papers 2405.09747, arXiv.org.
  • Handle: RePEc:arx:papers:2405.09747
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2405.09747
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yang Li & Yangyang Yu & Haohang Li & Zhi Chen & Khaldoun Khashanah, 2023. "TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance," Papers 2309.03736, arXiv.org.
    2. Xinli Yu & Zheng Chen & Yuan Ling & Shujing Dong & Zongyi Liu & Yanbin Lu, 2023. "Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting," Papers 2306.11025, arXiv.org.
    3. Hamilton, James D., 1990. "Analysis of time series subject to changes in regime," Journal of Econometrics, Elsevier, vol. 45(1-2), pages 39-70.
    4. Haohan Zhang & Fengrui Hua & Chengjin Xu & Hao Kong & Ruiting Zuo & Jian Guo, 2023. "Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?," Papers 2306.14222, arXiv.org, revised May 2024.
    5. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Raeid Saqur & Anastasis Kratsios & Florian Krach & Yannick Limmer & Jacob-Junqi Tian & John Willes & Blanka Horvath & Frank Rudzicz, 2024. "Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models," Papers 2406.02969, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Raeid Saqur, 2024. "What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs," Papers 2406.15508, arXiv.org.
    2. David Kuo Chuen Lee & Chong Guan & Yinghui Yu & Qinxu Ding, 2024. "A Comprehensive Review of Generative AI in Finance," FinTech, MDPI, vol. 3(3), pages 1-19, September.
    3. Shively, Gerald E., 2001. "Price thresholds, price volatility, and the private costs of investment in a developing country grain market," Economic Modelling, Elsevier, vol. 18(3), pages 399-414, August.
    4. Sarah Arndt & Zeno Enders, 2023. "The Transmission of Supply Shocks in Different Inflation Regimes," CESifo Working Paper Series 10839, CESifo.
    5. Michael Artis, 1999. "The UK and EMU," Palgrave Macmillan Books, in: David Cobham & George Zis (ed.), From EMS to EMU: 1979 to 1999 and Beyond, chapter 7, pages 161-180, Palgrave Macmillan.
    6. Xiang Lin & Martin Thomas Falk, 2022. "Nordic stock market performance of the travel and leisure industry during the first wave of Covid-19 pandemic," Tourism Economics, , vol. 28(5), pages 1240-1257, August.
    7. Moerman, G.A., 2001. "Unpredictable After All? A short note on exchange rate predictability," ERIM Report Series Research in Management ERS-2001-29-F&A, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    8. Peter McAdam, 2007. "USA, Japan and the Euro Area: Comparing Business-Cycle Features," International Review of Applied Economics, Taylor & Francis Journals, vol. 21(1), pages 135-156.
    9. Cavicchioli, Maddalena, 2024. "A matrix unified framework for deriving various impulse responses in Markov switching VAR: Evidence from oil and gas markets," The Journal of Economic Asymmetries, Elsevier, vol. 29(C).
    10. Franck Sédillot, 2001. "La pente des taux contient-elle de l'information sur l'activité économique future ?," Economie & Prévision, La Documentation Française, vol. 147(1), pages 141-157.
    11. Engel, Charles, 1994. "Can the Markov switching model forecast exchange rates?," Journal of International Economics, Elsevier, vol. 36(1-2), pages 151-165, February.
    12. Rafał Weron, 2009. "Heavy-tails and regime-switching in electricity prices," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 69(3), pages 457-473, July.
    13. Yip, Pick Schen & Brooks, Robert & Do, Hung Xuan & Nguyen, Duc Khuong, 2020. "Dynamic volatility spillover effects between oil and agricultural products," International Review of Financial Analysis, Elsevier, vol. 69(C).
    14. Vassilios Babalos & Mehmet Balcilar & Rangan Gupta, 2014. "Revisiting Herding Behavior in REITs: A Regime-Switching Approach," Working Papers 201448, University of Pretoria, Department of Economics.
    15. Theobald, Thomas, 2013. "Markov Switching with Endogenous Number of Regimes and Leading Indicators in a Real-Time Business Cycle Forecast," VfS Annual Conference 2013 (Duesseldorf): Competition Policy and Regulation in a Global Economic Order 79911, Verein für Socialpolitik / German Economic Association.
    16. Kal, Süleyman Hilmi & Arslaner, Ferhat & Arslaner, Nuran, 2015. "The dynamic relationship between stock, bond and foreign exchange markets," Economic Systems, Elsevier, vol. 39(4), pages 592-607.
    17. Kun-Huang Huarng & Tiffany Hui-Kuang Yu, 2017. "Using qualitative approach to forecasting regime switches," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(5), pages 2035-2048, September.
    18. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
    19. George Kapetanios, 2001. "Model Selection in Threshold Models," Journal of Time Series Analysis, Wiley Blackwell, vol. 22(6), pages 733-754, November.
    20. Dimitris Kirikos, 2000. "Forecasting exchange rates out of sample: random walk vs Markov switching regimes," Applied Economics Letters, Taylor & Francis Journals, vol. 7(2), pages 133-136.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2405.09747. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.