IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2404.01338.html
   My bibliography  Save this paper

Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation

Author

Listed:
  • Silvia Garc'ia-M'endez
  • Francisco de Arriba-P'erez
  • Ana Barros-Vila
  • Francisco J. Gonz'alez-Casta~no
  • Enrique Costa-Montenegro

Abstract

Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (NLP) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (LDA) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. We created an experimental data set composed of 2,158 financial news items that were manually labelled by NLP researchers to evaluate our solution. The ROUGE-L values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with LDA to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text.

Suggested Citation

  • Silvia Garc'ia-M'endez & Francisco de Arriba-P'erez & Ana Barros-Vila & Francisco J. Gonz'alez-Casta~no & Enrique Costa-Montenegro, 2024. "Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation," Papers 2404.01338, arXiv.org.
  • Handle: RePEc:arx:papers:2404.01338
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2404.01338
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Giang Thi Phi, 2020. "Framing overtourism: a critical news media analysis," Current Issues in Tourism, Taylor & Francis Journals, vol. 23(17), pages 2093-2097, September.
    2. Vermeer, Susan A.M. & Araujo, Theo & Bernritter, Stefan F. & van Noort, Guda, 2019. "Seeing the wood for the trees: How machine learning can help firms in identifying relevant electronic word-of-mouth in social media," International Journal of Research in Marketing, Elsevier, vol. 36(3), pages 492-508.
    3. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    4. Cepoi, Cosmin-Octavian, 2020. "Asymmetric dependence between stock market returns and news during COVID-19 financial turmoil," Finance Research Letters, Elsevier, vol. 36(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bouteska, Ahmed & Mefteh-Wali, Salma & Dang, Trung, 2022. "Predictive power of investor sentiment for Bitcoin returns: Evidence from COVID-19 pandemic," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    2. Carlini, Federico & Farina, Vincenzo & Gufler, Ivan & Previtali, Daniele, 2024. "Do stress and overstatement in the news affect the stock market? Evidence from COVID-19 news in The Wall Street Journal," International Review of Financial Analysis, Elsevier, vol. 93(C).
    3. Jiao Ji & Oleksandr Talavera & Shuxing Yin, 2018. "The Hidden Information Content: Evidence from the Tone of Independent Director Reports," Working Papers 2018-28, Swansea University, School of Management.
    4. Umar, Tarik, 2022. "Complexity aversion when SeekingAlpha," Journal of Accounting and Economics, Elsevier, vol. 73(2).
    5. Drago, Carlo & Ginesti, Gianluca & Pongelli, Claudia & Sciascia, Salvatore, 2018. "Reporting strategies: What makes family firms beat around the bush? Family-related antecedents of annual report readability," Journal of Family Business Strategy, Elsevier, vol. 9(2), pages 142-150.
    6. Rybinski, Krzysztof, 2020. "The forecasting power of the multi-language narrative of sell-side research: A machine learning evaluation," Finance Research Letters, Elsevier, vol. 34(C).
    7. Rolf Uwe Fülbier & Thorsten Sellhorn, 2023. "Understanding and improving the language of business: How accounting and corporate reporting research can better serve business and society," Journal of Business Economics, Springer, vol. 93(6), pages 1089-1124, August.
    8. Chris Florakis & Christodoulos Louca & Roni Michaely & Michael Weber, 2020. "Cybersecurity Risk," Working Papers 2020-178, Becker Friedman Institute for Research In Economics.
    9. Liu, Pu & Nguyen, Hazel T., 2020. "CEO characteristics and tone at the top inconsistency," Journal of Economics and Business, Elsevier, vol. 108(C).
    10. Jiang, Yonghong & Wu, Lanxin & Tian, Gengyu & Nie, He, 2021. "Do cryptocurrencies hedge against EPU and the equity market volatility during COVID-19? – New evidence from quantile coherency analysis," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 72(C).
    11. Kladakis, George & Chen, Lei & Bellos, Sotirios K., 2023. "Ethical bank disclosures and liquidity creation," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 84(C).
    12. Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
    13. Leilane de Freitas Rocha Cambara & Roberto Meurer, 2023. "News sentiment and foreign portfolio investment in Brazil," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(3), pages 3332-3348, July.
    14. Tarek A Hassan & Stephan Hollander & Laurence van Lent & Ahmed Tahoun, 2019. "Firm-Level Political Risk: Measurement and Effects," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(4), pages 2135-2202.
    15. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Mar 2023.
    16. Al-Maadid, Alanoud & Alhazbi, Saleh & Al-Thelaya, Khaled, 2022. "Using machine learning to analyze the impact of coronavirus pandemic news on the stock markets in GCC countries," Research in International Business and Finance, Elsevier, vol. 61(C).
    17. Gao, Lei & Calderon, Thomas G. & Tang, Fengchun, 2020. "Public companies' cybersecurity risk disclosures," International Journal of Accounting Information Systems, Elsevier, vol. 38(C).
    18. James A. Danowski & Bei Yan & Ken Riopelle, 2021. "A semantic network approach to measuring sentiment," Quality & Quantity: International Journal of Methodology, Springer, vol. 55(1), pages 221-255, February.
    19. Diniz-Maganini, Natalia & Diniz, Eduardo H. & Rasheed, Abdul A., 2021. "Bitcoin’s price efficiency and safe haven properties during the COVID-19 pandemic: A comparison," Research in International Business and Finance, Elsevier, vol. 58(C).
    20. Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2021. "Unpacking the black box of ICO white papers: a topic modeling approach," LIDAM Discussion Papers LFIN 2021018, Université catholique de Louvain, Louvain Finance (LFIN).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2404.01338. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.