IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i1p522-540.html
   My bibliography  Save this article

Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach

Author

Listed:
  • Yi Yang

    (Hong Kong University of Science and Technology, Kowloon, Hong Kong)

  • Kunpeng Zhang

    (University of Maryland, College Park, Maryland 20742)

  • Yangyang Fan

    (Hong Kong Polytechnic University, Kowloon, Hong Kong)

Abstract

Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning approaches, particularly text-mining techniques, have been implemented to predict stock return volatility, thus taking advantage of the availability of large amounts of unstructured data such as firm financial reports. Most existing studies develop simple but effective models to analyze text, such as dictionary-based matching algorithms that use a set of manually constructed keywords. However, the latent and deep semantics encoded in text are usually neglected. In this study, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from a well-known finance-domain lexicon (the Loughran and McDonald (2011) word list), which helps us learn semantic relationships among words in firm reports for better volatility prediction. Using over 10 years of annual reports from Russell 3000 firms, we empirically show that, compared with cutting-edge benchmarks, our proposed method achieves significant improvement in terms of prediction error, for example, a 28.4% reduction on average. We also discuss the practical and methodological implications of our findings. Our financial-specific word-embedding program is available as open-source information so that researchers can use it to analyze financial reports and assess financial risks. Summary of Contribution: Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning, especially text-mining, techniques have been developed to predict stock return volatility given the availability of a large amount of unstructured data, such as firm annual reports. Most existing research develops simple but effective approaches, for example, manually constructing a set of keywords to analyze texts. However, the latent and deep semantics encoded in texts are usually ignored. In this research, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from the finance-domain lexicon of Loughran and McDonald (2011), which helps us learn the semantic relationships among words in firm annual reports for better volatility prediction. In this study, we make the following contributions. First, methodologically, we are among the first to incorporate finance-specific lexicon into representation learning for stock volatility prediction. We propose a novel knowledge-driven text-embedding model that is trained on a large amount of unstructured textual data to learn high quality word embedding. Our proposed approach is effective in predicting stock return volatility, and the approach can potentially have broader applications. Second, substantively, we empirically show that the domain lexicon enhanced text representation learning can indeed significantly improve the performance, compared with bag-of-words models and generic word embedding for volatility prediction. Domain knowledge combined with text learning plays a critical enabling role in understanding financial reports. Third, our method adds on to existing literature on designing financial information systems by incorporating ontology knowledge, common-sense knowledge, and general prior knowledge.

Suggested Citation

  • Yi Yang & Kunpeng Zhang & Yangyang Fan, 2022. "Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 522-540, January.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:522-540
    DOI: 10.1287/ijoc.2020.1046
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2020.1046
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2020.1046?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    2. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    3. Ser-Huang Poon & Clive W.J. Granger, 2003. "Forecasting Volatility in Financial Markets: A Review," Journal of Economic Literature, American Economic Association, vol. 41(2), pages 478-539, June.
    4. Frankel, R & Johnson, M & Skinner, DJ, 1999. "An empirical examination of conference calls as a voluntary disclosure medium," Journal of Accounting Research, Wiley Blackwell, vol. 37(1), pages 133-150.
    5. Peter F. Christoffersen & Francis X. Diebold, 2000. "How Relevant is Volatility Forecasting for Financial Risk Management?," The Review of Economics and Statistics, MIT Press, vol. 82(1), pages 12-22, February.
    6. Loughran, Tim & McDonald, Bill, 2013. "IPO first-day returns, offer price revisions, volatility, and form S-1 language," Journal of Financial Economics, Elsevier, vol. 109(2), pages 307-326.
    7. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    8. Black, Fischer & Scholes, Myron S, 1973. "The Pricing of Options and Corporate Liabilities," Journal of Political Economy, University of Chicago Press, vol. 81(3), pages 637-654, May-June.
    9. Kearney, Colm & Liu, Sha, 2014. "Textual sentiment in finance: A survey of methods and models," International Review of Financial Analysis, Elsevier, vol. 33(C), pages 171-185.
    10. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    11. Xin Li & Kun Chen & Sherry X. Sun & Terrance Fung & Huaiqing Wang & Daniel D. Zeng, 2016. "A Commonsense Knowledge-Enabled Textual Analysis Approach for Financial Market Surveillance," INFORMS Journal on Computing, INFORMS, vol. 28(2), pages 278-294, May.
    12. Bodnaruk, Andriy & Loughran, Tim & McDonald, Bill, 2015. "Using 10-K Text to Gauge Financial Constraints," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 50(4), pages 623-646, August.
    13. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    14. Jegadeesh, Narasimhan & Wu, Di, 2013. "Word power: A new approach for content analysis," Journal of Financial Economics, Elsevier, vol. 110(3), pages 712-729.
    15. David Pardoe & Peter Stone & Maytal Saar-Tsechansky & Tayfun Keskin & Kerem Tomak, 2010. "Adaptive Auction Mechanism Design and the Incorporation of Prior Knowledge," INFORMS Journal on Computing, INFORMS, vol. 22(3), pages 353-370, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xue Wen Tan & Stanley Kok, 2024. "Explainable Risk Classification in Financial Reports," Papers 2405.01881, arXiv.org, revised May 2024.
    2. Hao Lin & Guannan Liu & Junjie Wu & J. Leon Zhao, 2024. "Deterring the Gray Market: Product Diversion Detection via Learning Disentangled Representations of Multivariate Time Series," INFORMS Journal on Computing, INFORMS, vol. 36(2), pages 571-586, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Doshi, Hitesh & Patel, Saurin & Ramani, Srikanth & Sooy, Matthew, 2023. "Uncertain tone, asset volatility and credit default swap spreads," Journal of Contemporary Accounting and Economics, Elsevier, vol. 19(3).
    2. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    3. Shuangyan Li & Guangrui Wang & Yongli Luo, 2022. "Tone of language, financial disclosure, and earnings management: a textual analysis of form 20-F," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-24, December.
    4. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    5. Bian, Shibo & Jia, Dekui & Li, Ruihai & Sun, Wujun & Yan, Zhipeng & Zheng, Yingfei, 2021. "Can management tone predict IPO performance? – Evidence from mandatory online roadshows in China," Pacific-Basin Finance Journal, Elsevier, vol. 68(C).
    6. Christina Bannier & Thomas Pauls & Andreas Walter, 2019. "Content analysis of business communication: introducing a German dictionary," Journal of Business Economics, Springer, vol. 89(1), pages 79-123, February.
    7. Katsafados, Apostolos G. & Leledakis, George N. & Pyrgiotakis, Emmanouil G. & Androutsopoulos, Ion & Fergadiotis, Manos, 2024. "Machine learning in bank merger prediction: A text-based approach," European Journal of Operational Research, Elsevier, vol. 312(2), pages 783-797.
    8. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, Wiley Blackwell, vol. 54(4), pages 1187-1230, September.
    9. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
    10. Yan, Yumeng & Xiong, Xiong & Meng, J. Ginger & Zou, Gaofeng, 2019. "Uncertainty and IPO initial returns: Evidence from the Tone Analysis of China’s IPO Prospectuses," Pacific-Basin Finance Journal, Elsevier, vol. 57(C).
    11. Dimitris Anastasiou & Apostolos Katsafados, 2023. "Bank deposits and textual sentiment: When an European Central Bank president's speech is not just a speech," Manchester School, University of Manchester, vol. 91(1), pages 55-87, January.
    12. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.
    13. Ahmad, Khurshid & Han, JingGuang & Hutson, Elaine & Kearney, Colm & Liu, Sha, 2016. "Media-expressed negative tone and firm-level stock returns," Journal of Corporate Finance, Elsevier, vol. 37(C), pages 152-172.
    14. Renato Camodeca & Alex Almici & Umberto Sagliaschi, 2018. "Sustainability Disclosure in Integrated Reporting: Does It Matter to Investors? A Cheap Talk Approach," Sustainability, MDPI, vol. 10(12), pages 1-34, November.
    15. Anastasiou, Dimitrios & Katsafados, Apostolos G., 2020. "Bank Deposits Flows and Textual Sentiment: When an ECB President's speech is not just a speech," MPRA Paper 99729, University Library of Munich, Germany.
    16. Ingrid E. Fisher & Margaret R. Garnsey & Mark E. Hughes, 2016. "Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 23(3), pages 157-214, July.
    17. Loughran, Tim & McDonald, Bill & Pragidis, Ioannis, 2019. "Assimilation of oil news into prices," International Review of Financial Analysis, Elsevier, vol. 63(C), pages 105-118.
    18. Anand, Abhinav & Basu, Sankarshan & Pathak, Jalaj & Thampy, Ashok, 2021. "The impact of sentiment on emerging stock markets," International Review of Economics & Finance, Elsevier, vol. 75(C), pages 161-177.
    19. Massa, Massimo & von Beschwitz, Bastian & Keim, Donald B, 2015. "First to ?Read? the News: News Analytics and Institutional Trading," CEPR Discussion Papers 10534, C.E.P.R. Discussion Papers.
    20. Ioanna Kountouri & Eleftherios Manousakis & Andrianos E. Tsekrekos, 2019. "Latent semantic analysis of corporate social responsibility reports (with an application to Hellenic firms)," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 16(1), pages 1-19, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:522-540. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.