IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i1p522-540.html
   My bibliography  Save this article

Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach

Author

Listed:
  • Yi Yang

    (Hong Kong University of Science and Technology, Kowloon, Hong Kong)

  • Kunpeng Zhang

    (University of Maryland, College Park, Maryland 20742)

  • Yangyang Fan

    (Hong Kong Polytechnic University, Kowloon, Hong Kong)

Abstract

Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning approaches, particularly text-mining techniques, have been implemented to predict stock return volatility, thus taking advantage of the availability of large amounts of unstructured data such as firm financial reports. Most existing studies develop simple but effective models to analyze text, such as dictionary-based matching algorithms that use a set of manually constructed keywords. However, the latent and deep semantics encoded in text are usually neglected. In this study, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from a well-known finance-domain lexicon (the Loughran and McDonald (2011) word list), which helps us learn semantic relationships among words in firm reports for better volatility prediction. Using over 10 years of annual reports from Russell 3000 firms, we empirically show that, compared with cutting-edge benchmarks, our proposed method achieves significant improvement in terms of prediction error, for example, a 28.4% reduction on average. We also discuss the practical and methodological implications of our findings. Our financial-specific word-embedding program is available as open-source information so that researchers can use it to analyze financial reports and assess financial risks. Summary of Contribution: Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning, especially text-mining, techniques have been developed to predict stock return volatility given the availability of a large amount of unstructured data, such as firm annual reports. Most existing research develops simple but effective approaches, for example, manually constructing a set of keywords to analyze texts. However, the latent and deep semantics encoded in texts are usually ignored. In this research, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from the finance-domain lexicon of Loughran and McDonald (2011), which helps us learn the semantic relationships among words in firm annual reports for better volatility prediction. In this study, we make the following contributions. First, methodologically, we are among the first to incorporate finance-specific lexicon into representation learning for stock volatility prediction. We propose a novel knowledge-driven text-embedding model that is trained on a large amount of unstructured textual data to learn high quality word embedding. Our proposed approach is effective in predicting stock return volatility, and the approach can potentially have broader applications. Second, substantively, we empirically show that the domain lexicon enhanced text representation learning can indeed significantly improve the performance, compared with bag-of-words models and generic word embedding for volatility prediction. Domain knowledge combined with text learning plays a critical enabling role in understanding financial reports. Third, our method adds on to existing literature on designing financial information systems by incorporating ontology knowledge, common-sense knowledge, and general prior knowledge.

Suggested Citation

  • Yi Yang & Kunpeng Zhang & Yangyang Fan, 2022. "Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 522-540, January.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:522-540
    DOI: 10.1287/ijoc.2020.1046
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2020.1046
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2020.1046?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    2. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    3. Jegadeesh, Narasimhan & Wu, Di, 2013. "Word power: A new approach for content analysis," Journal of Financial Economics, Elsevier, vol. 110(3), pages 712-729.
    4. Peter F. Christoffersen & Francis X. Diebold, 2000. "How Relevant is Volatility Forecasting for Financial Risk Management?," The Review of Economics and Statistics, MIT Press, vol. 82(1), pages 12-22, February.
    5. Xin Li & Kun Chen & Sherry X. Sun & Terrance Fung & Huaiqing Wang & Daniel D. Zeng, 2016. "A Commonsense Knowledge-Enabled Textual Analysis Approach for Financial Market Surveillance," INFORMS Journal on Computing, INFORMS, vol. 28(2), pages 278-294, May.
    6. Bodnaruk, Andriy & Loughran, Tim & McDonald, Bill, 2015. "Using 10-K Text to Gauge Financial Constraints," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 50(4), pages 623-646, August.
    7. David Pardoe & Peter Stone & Maytal Saar-Tsechansky & Tayfun Keskin & Kerem Tomak, 2010. "Adaptive Auction Mechanism Design and the Incorporation of Prior Knowledge," INFORMS Journal on Computing, INFORMS, vol. 22(3), pages 353-370, August.
    8. Frankel, R & Johnson, M & Skinner, DJ, 1999. "An empirical examination of conference calls as a voluntary disclosure medium," Journal of Accounting Research, Wiley Blackwell, vol. 37(1), pages 133-150.
    9. Loughran, Tim & McDonald, Bill, 2013. "IPO first-day returns, offer price revisions, volatility, and form S-1 language," Journal of Financial Economics, Elsevier, vol. 109(2), pages 307-326.
    10. Kearney, Colm & Liu, Sha, 2014. "Textual sentiment in finance: A survey of methods and models," International Review of Financial Analysis, Elsevier, vol. 33(C), pages 171-185.
    11. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    12. Ser-Huang Poon & Clive W.J. Granger, 2003. "Forecasting Volatility in Financial Markets: A Review," Journal of Economic Literature, American Economic Association, vol. 41(2), pages 478-539, June.
    13. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    14. Black, Fischer & Scholes, Myron S, 1973. "The Pricing of Options and Corporate Liabilities," Journal of Political Economy, University of Chicago Press, vol. 81(3), pages 637-654, May-June.
    15. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xue Wen Tan & Stanley Kok, 2024. "Explainable Risk Classification in Financial Reports," Papers 2405.01881, arXiv.org, revised May 2024.
    2. Hao Lin & Guannan Liu & Junjie Wu & J. Leon Zhao, 2024. "Deterring the Gray Market: Product Diversion Detection via Learning Disentangled Representations of Multivariate Time Series," INFORMS Journal on Computing, INFORMS, vol. 36(2), pages 571-586, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shuangyan Li & Guangrui Wang & Yongli Luo, 2022. "Tone of language, financial disclosure, and earnings management: a textual analysis of form 20-F," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-24, December.
    2. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    3. Christina Bannier & Thomas Pauls & Andreas Walter, 2019. "Content analysis of business communication: introducing a German dictionary," Journal of Business Economics, Springer, vol. 89(1), pages 79-123, February.
    4. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    5. Yan, Yumeng & Xiong, Xiong & Meng, J. Ginger & Zou, Gaofeng, 2019. "Uncertainty and IPO initial returns: Evidence from the Tone Analysis of China’s IPO Prospectuses," Pacific-Basin Finance Journal, Elsevier, vol. 57(C).
    6. Dimitris Anastasiou & Apostolos Katsafados, 2023. "Bank deposits and textual sentiment: When an European Central Bank president's speech is not just a speech," Manchester School, University of Manchester, vol. 91(1), pages 55-87, January.
    7. Doshi, Hitesh & Patel, Saurin & Ramani, Srikanth & Sooy, Matthew, 2023. "Uncertain tone, asset volatility and credit default swap spreads," Journal of Contemporary Accounting and Economics, Elsevier, vol. 19(3).
    8. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    9. Bian, Shibo & Jia, Dekui & Li, Ruihai & Sun, Wujun & Yan, Zhipeng & Zheng, Yingfei, 2021. "Can management tone predict IPO performance? – Evidence from mandatory online roadshows in China," Pacific-Basin Finance Journal, Elsevier, vol. 68(C).
    10. Katsafados, Apostolos G. & Leledakis, George N. & Pyrgiotakis, Emmanouil G. & Androutsopoulos, Ion & Fergadiotis, Manos, 2024. "Machine learning in bank merger prediction: A text-based approach," European Journal of Operational Research, Elsevier, vol. 312(2), pages 783-797.
    11. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
    12. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.
    13. Anastasiou, Dimitrios & Katsafados, Apostolos G., 2020. "Bank Deposits Flows and Textual Sentiment: When an ECB President's speech is not just a speech," MPRA Paper 99729, University Library of Munich, Germany.
    14. Loughran, Tim & McDonald, Bill & Pragidis, Ioannis, 2019. "Assimilation of oil news into prices," International Review of Financial Analysis, Elsevier, vol. 63(C), pages 105-118.
    15. Nadine Gatzert & Dinah Heidinger, 2020. "An Empirical Analysis of Market Reactions to the First Solvency and Financial Condition Reports in the European Insurance Industry," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 87(2), pages 407-436, June.
    16. Simon Fritzsch & Philipp Scharner & Gregor Weiß, 2021. "Estimating the relation between digitalization and the market value of insurers," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(3), pages 529-567, September.
    17. Buehlmaier, Matthias M. M. & Zechner, Josef, 2016. "Financial media, price discovery, and merger arbitrage," CFS Working Paper Series 551, Center for Financial Studies (CFS).
    18. Li, Ken, 2022. "Textual fundamentals in earnings press releases," Advances in accounting, Elsevier, vol. 57(C).
    19. Kothari, Pratik & Chance, Don M. & Ferris, Stephen P., 2021. "Bragging rights: Does corporate boasting imply value creation?," Journal of Corporate Finance, Elsevier, vol. 67(C).
    20. Bilal Hafeez & M. Humayun Kabir & Udomsak Wongchoti, 2022. "Are retail investors really passive? Shareholder activism in the digital age," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 49(3-4), pages 423-460, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:522-540. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.