IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2308.08031.html
   My bibliography  Save this paper

Company Similarity using Large Language Models

Author

Listed:
  • Dimitrios Vamvourellis
  • M'at'e Toth
  • Snigdha Bhagat
  • Dhruv Desai
  • Dhagash Mehta
  • Stefano Pasquali

Abstract

Identifying companies with similar profiles is a core task in finance with a wide range of applications in portfolio construction, asset pricing and risk attribution. When a rigorous definition of similarity is lacking, financial analysts usually resort to 'traditional' industry classifications such as Global Industry Classification System (GICS) which assign a unique category to each company at different levels of granularity. Due to their discrete nature, though, GICS classifications do not allow for ranking companies in terms of similarity. In this paper, we explore the ability of pre-trained and finetuned large language models (LLMs) to learn company embeddings based on the business descriptions reported in SEC filings. We show that we can reproduce GICS classifications using the embeddings as features. We also benchmark these embeddings on various machine learning and financial metrics and conclude that the companies that are similar according to the embeddings are also similar in terms of financial performance metrics including return correlation.

Suggested Citation

  • Dimitrios Vamvourellis & M'at'e Toth & Snigdha Bhagat & Dhruv Desai & Dhagash Mehta & Stefano Pasquali, 2023. "Company Similarity using Large Language Models," Papers 2308.08031, arXiv.org.
  • Handle: RePEc:arx:papers:2308.08031
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2308.08031
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sanjeev Bhojraj & Charles M. C. Lee, 2002. "Who Is My Peer? A Valuation‐Based Approach to the Selection of Comparable Firms," Journal of Accounting Research, Wiley Blackwell, vol. 40(2), pages 407-439, May.
    2. Gerard Hoberg & Gordon Phillips, 2016. "Text-Based Network Industries and Endogenous Product Differentiation," Journal of Political Economy, University of Chicago Press, vol. 124(5), pages 1423-1465.
    3. Paul Geertsema & Helen Lu, 2023. "Relative Valuation with Machine Learning," Journal of Accounting Research, Wiley Blackwell, vol. 61(1), pages 329-376, March.
    4. Rian Dolphin & Barry Smyth & Ruihai Dong, 2022. "Stock Embeddings: Learning Distributed Representations for Financial Assets," Papers 2202.08968, arXiv.org.
    5. Tim Loughran & Bill McDonald, 2020. "Textual Analysis in Finance," Annual Review of Financial Economics, Annual Reviews, vol. 12(1), pages 357-375, December.
    6. Rhodes-Kropf, Matthew & Robinson, David T. & Viswanathan, S., 2005. "Valuation waves and merger activity: The empirical evidence," Journal of Financial Economics, Elsevier, vol. 77(3), pages 561-603, September.
    7. Kaustia, Markku & Rantala, Ville, 2015. "Social learning and corporate peer effects," Journal of Financial Economics, Elsevier, vol. 117(3), pages 653-669.
    8. Guenther, David A. & Rosman, Andrew J., 1994. "Differences between COMPUSTAT and CRSP SIC codes and related effects on research," Journal of Accounting and Economics, Elsevier, vol. 18(1), pages 115-128, July.
    9. Bhaskarjit Sarmah & Nayana Nair & Dhagash Mehta & Stefano Pasquali, 2022. "Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning," Papers 2207.07183, arXiv.org.
    10. Fama, Eugene F. & French, Kenneth R., 1997. "Industry costs of equity," Journal of Financial Economics, Elsevier, vol. 43(2), pages 153-193, February.
    11. Rian Dolphin & Barry Smyth & Ruihai Dong, 2023. "Industry Classification Using a Novel Financial Time-Series Case Representation," Papers 2305.00245, arXiv.org.
    12. Lee, Charles M.C. & Ma, Paul & Wang, Charles C.Y., 2015. "Search-based peer firms: Aggregating investor perceptions through internet co-searches," Journal of Financial Economics, Elsevier, vol. 116(2), pages 410-431.
    13. Bartram, Söhnke M. & Grinblatt, Mark, 2018. "Agnostic fundamental analysis works," Journal of Financial Economics, Elsevier, vol. 128(1), pages 125-147.
    14. Jing Liu & Doron Nissim & Jacob Thomas, 2002. "Equity Valuation Using Multiples," Journal of Accounting Research, Wiley Blackwell, vol. 40(1), pages 135-172, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marco Molinari & Victor Shao & Vladimir Tregubiak & Abhimanyu Pandey & Mateusz Mikolajczak & Sebastian Kuznetsov Ryder Torres Pereira, 2024. "Interpretable Company Similarity with Sparse Autoencoders," Papers 2412.02605, arXiv.org, revised Dec 2024.
    2. Alexander Bakumenko & Katev{r}ina Hlav'av{c}kov'a-Schindler & Claudia Plant & Nina C. Hubig, 2024. "Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs," Papers 2406.03614, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul Geertsema & Helen Lu, 2023. "Relative Valuation with Machine Learning," Journal of Accounting Research, Wiley Blackwell, vol. 61(1), pages 329-376, March.
    2. Preetha Saha & Jingrao Lyu & Dhruv Desai & Rishab Chauhan & Jerinsh Jeyapaulraj & Philip Sommer & Dhagash Mehta, 2024. "Machine Learning-based Relative Valuation of Municipal Bonds," Papers 2408.02273, arXiv.org.
    3. Sanjeev Bhojraj & Charles M. C. Lee & Derek K. Oler, 2003. "What's My Line? A Comparison of Industry Classification Schemes for Capital Market Research," Journal of Accounting Research, Wiley Blackwell, vol. 41(5), pages 745-774, December.
    4. Karel Janda, 2019. "Earnings Stability and Peer Company Selection for Multiple Based Indirect Valuation," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 69(1), pages 37-75, February.
    5. Lee, Charles M.C. & Ma, Paul & Wang, Charles C.Y., 2015. "Search-based peer firms: Aggregating investor perceptions through internet co-searches," Journal of Financial Economics, Elsevier, vol. 116(2), pages 410-431.
    6. Francis, Bill & Hasan, Iftekhar & Mani, Sureshbabu & Ye, Pengfei, 2016. "Relative peer quality and firm performance," Journal of Financial Economics, Elsevier, vol. 122(1), pages 196-219.
    7. Bartram, Söhnke M. & Grinblatt, Mark, 2018. "Agnostic fundamental analysis works," Journal of Financial Economics, Elsevier, vol. 128(1), pages 125-147.
    8. Cong, Lin William & George, Nathan Darden & Wang, Guojun, 2023. "RIM-based value premium and factor pricing using value-price divergence," Journal of Banking & Finance, Elsevier, vol. 149(C).
    9. Andreou, Christoforos K. & Lambertides, Neophytos & Panayides, Photis M., 2021. "Distress risk anomaly and misvaluation," The British Accounting Review, Elsevier, vol. 53(5).
    10. repec:hum:wpaper:sfb649dp2005-062 is not listed on IDEAS
    11. Rian Dolphin & Barry Smyth & Ruihai Dong, 2022. "A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets," Papers 2211.06378, arXiv.org.
    12. Skočir, Matevž & Lončarski, Igor, 2024. "On the importance of asset pricing factors in the relative valuation," Research in International Business and Finance, Elsevier, vol. 70(PB).
    13. Bonaime, Alice & Gulen, Huseyin & Ion, Mihai, 2018. "Does policy uncertainty affect mergers and acquisitions?," Journal of Financial Economics, Elsevier, vol. 129(3), pages 531-558.
    14. Eaton, Gregory W. & Guo, Feng & Liu, Tingting & Officer, Micah S., 2022. "Peer selection and valuation in mergers and acquisitions," Journal of Financial Economics, Elsevier, vol. 146(1), pages 230-255.
    15. Xia, Jingjing, 2023. "Redrawing the line: Narrowly beating analyst forecasts and journalists’ co-coverage choices in earnings-related news articles," Journal of Contemporary Accounting and Economics, Elsevier, vol. 19(3).
    16. Lee, Charles M.C. & Sun, Stephen Teng & Wang, Rongfei & Zhang, Ran, 2019. "Technological links and predictable returns," Journal of Financial Economics, Elsevier, vol. 132(3), pages 76-96.
    17. Weiner, Christian, 2005. "The impact of industry classification schemes on financial research," SFB 649 Discussion Papers 2005-062, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    18. Gordon Richardson & Surjit Tinaikar, 2004. "Accounting based valuation models: what have we learned?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 44(2), pages 223-255, July.
    19. Zura Kakushadze & Willie Yu, 2017. "Open Source Fundamental Industry Classification," Papers 1706.04210, arXiv.org, revised Dec 2017.
    20. Hugh M. J. Colaco & Amedeo De Cesari & Shantaram P. Hegde, 2017. "Retail Investor Attention and IPO Valuation," European Financial Management, European Financial Management Association, vol. 23(4), pages 691-727, September.
    21. Gao, Ning & Peng, Ni & Strong, Norman, 2017. "What determines horizontal merger antitrust case selection?," Journal of Corporate Finance, Elsevier, vol. 46(C), pages 51-76.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2308.08031. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.