IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2207.04959.html
   My bibliography  Save this paper

Learning Mutual Fund Categorization using Natural Language Processing

Author

Listed:
  • Dimitrios Vamvourellis
  • Mate Attila Toth
  • Dhruv Desai
  • Dhagash Mehta
  • Stefano Pasquali

Abstract

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.

Suggested Citation

  • Dimitrios Vamvourellis & Mate Attila Toth & Dhruv Desai & Dhagash Mehta & Stefano Pasquali, 2022. "Learning Mutual Fund Categorization using Natural Language Processing," Papers 2207.04959, arXiv.org.
  • Handle: RePEc:arx:papers:2207.04959
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2207.04959
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vipul Satone & Dhruv Desai & Dhagash Mehta, 2021. "Fund2Vec: Mutual Funds Similarity using Graph Learning," Papers 2106.12987, arXiv.org.
    2. Moreno, David & Marco, Paulina & Olmeda, Ignacio, 2006. "Self-organizing maps could improve the classification of Spanish mutual funds," European Journal of Operational Research, Elsevier, vol. 174(2), pages 1039-1054, October.
    3. Kim, Moon & Shukla, Ravi & Tomas, Michael, 2000. "Mutual fund objective misclassification," Journal of Economics and Business, Elsevier, vol. 52(4), pages 309-323.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.
    2. Vipul Satone & Dhruv Desai & Dhagash Mehta, 2021. "Fund2Vec: Mutual Funds Similarity using Graph Learning," Papers 2106.12987, arXiv.org.
    3. Dhagash Mehta & Dhruv Desai & Jithin Pradeep, 2020. "Machine Learning Fund Categorizations," Papers 2006.00123, arXiv.org.
    4. Javier Vidal-García & Marta Vidal & Sabri Boubaker & Riadh Manita, 2019. "Idiosyncratic risk and mutual fund performance," Annals of Operations Research, Springer, vol. 281(1), pages 349-372, October.
    5. Agapova, Anna & Kaprielyan, Margarita, 2023. "Diversification measures: Mutual fund family case," International Review of Financial Analysis, Elsevier, vol. 90(C).
    6. repec:dgr:rugsom:01e17 is not listed on IDEAS
    7. Jerinsh Jeyapaulraj & Dhruv Desai & Peter Chu & Dhagash Mehta & Stefano Pasquali & Philip Sommer, 2022. "Supervised similarity learning for corporate bonds using Random Forest proximities," Papers 2207.04368, arXiv.org, revised Oct 2022.
    8. Zheng-Guo, Maiko & Hernández-Ramírez, Manrique & Solís, Martín, 2023. "How to choose investments that match your needs? A proposal for the categorization of mutual funds for Latin American emerging markets, case of Costa Rica," Revista de Ciencias Económicas, Instituto de Investigaciones en Ciencias Económicas, Universidad de Costa Rica, vol. 41(1), January.
    9. DeMiguel, Victor & Gil-Bazo, Javier & Nogales, Francisco J. & Santos, André A.P., 2023. "Machine learning and fund characteristics help to select mutual funds with positive alpha," Journal of Financial Economics, Elsevier, vol. 150(3).
    10. Laura Fabregat-Aibar & Maria-Teresa Sorrosal-Forradellas & Glòria Barberà-Mariné & Antonio Terceño, 2021. "Can Artificial Neural Networks Predict the Survival Capacity of Mutual Funds? Evidence from Spain," Mathematics, MDPI, vol. 9(6), pages 1-10, March.
    11. Francesco Lisi, 2011. "Dicing with the market: randomized procedures for evaluation of mutual funds," Quantitative Finance, Taylor & Francis Journals, vol. 11(2), pages 163-172.
    12. Emmanuel Jurczenko & Bertrand Maillet & Paul Merlin, 2008. "Efficient Frontier for Robust Higher-order Moment Portfolio Selection," Post-Print halshs-00336475, HAL.
    13. repec:onb:oenbwp:y:2005:i:9:b:1 is not listed on IDEAS
    14. Rian Dolphin & Barry Smyth & Ruihai Dong, 2022. "A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets," Papers 2211.06378, arXiv.org.
    15. Laurens Swinkels & Pieter Van Der Sluis, 2006. "Return-based style analysis with time-varying exposures," The European Journal of Finance, Taylor & Francis Journals, vol. 12(6-7), pages 529-552.
    16. Yi, Li & Xiao, Li & Liao, Yinkai, 2024. "Network centrality, style drift, and mutual fund performance," Research in International Business and Finance, Elsevier, vol. 70(PA).
    17. Bhaskarjit Sarmah & Nayana Nair & Dhagash Mehta & Stefano Pasquali, 2022. "Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning," Papers 2207.07183, arXiv.org.
    18. Konstantina Pendaraki & Michael Doumpos & Constantin Zopounidis, 2003. "Assessing Equity Mutual Funds' Performance Using a Multicriteria Methodology: A Comparative Analysis," South-Eastern Europe Journal of Economics, Association of Economic Universities of South and Eastern Europe and the Black Sea Region, vol. 1(1), pages 85-104.
    19. Mostafa, Mohamed M. & Nataraajan, Rajan, 2009. "A neuro-computational intelligence analysis of the ecological footprint of nations," Computational Statistics & Data Analysis, Elsevier, vol. 53(9), pages 3516-3531, July.
    20. Yunmi Kim & Douglas Stone & Tae-Hwan Kim, 2021. "Testing for structural breaks in return-based style regression models," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 35(1), pages 61-76, March.
    21. Stavrou, Eleni T. & Charalambous, Christakis & Spiliotis, Stelios, 2007. "Human resource management and performance: A neural network analysis," European Journal of Operational Research, Elsevier, vol. 181(1), pages 453-467, August.
    22. Nathalia Castellanos & Dhruv Desai & Sebastian Frank & Stefano Pasquali & Dhagash Mehta, 2024. "Can an unsupervised clustering algorithm reproduce a categorization system?," Papers 2408.10340, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.04959. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.