Learning Mutual Fund Categorization using Natural Language Processing

My bibliography Save this paper

Learning Mutual Fund Categorization using Natural Language Processing

Author

Listed:

Dimitrios Vamvourellis
Mate Attila Toth
Dhruv Desai
Dhagash Mehta
Stefano Pasquali

Registered:

Abstract

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.

Suggested Citation

Dimitrios Vamvourellis & Mate Attila Toth & Dhruv Desai & Dhagash Mehta & Stefano Pasquali, 2022. "Learning Mutual Fund Categorization using Natural Language Processing," Papers 2207.04959, arXiv.org.

Handle: RePEc:arx:papers:2207.04959

Download full text from publisher

References listed on IDEAS

Vipul Satone & Dhruv Desai & Dhagash Mehta, 2021. "Fund2Vec: Mutual Funds Similarity using Graph Learning," Papers 2106.12987, arXiv.org.
Moreno, David & Marco, Paulina & Olmeda, Ignacio, 2006. "Self-organizing maps could improve the classification of Spanish mutual funds," European Journal of Operational Research, Elsevier, vol. 174(2), pages 1039-1054, October.
Kim, Moon & Shukla, Ravi & Tomas, Michael, 2000. "Mutual fund objective misclassification," Journal of Economics and Business, Elsevier, vol. 52(4), pages 309-323.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Dhruv Desai & Ashmita Dhiman & Tushar Sharma & Deepika Sharma & Dhagash Mehta & Stefano Pasquali, 2023. "Quantifying Outlierness of Funds from their Categories using Supervised Similarity," Papers 2308.06882, arXiv.org.
Vipul Satone & Dhruv Desai & Dhagash Mehta, 2021. "Fund2Vec: Mutual Funds Similarity using Graph Learning," Papers 2106.12987, arXiv.org.
Dhagash Mehta & Dhruv Desai & Jithin Pradeep, 2020. "Machine Learning Fund Categorizations," Papers 2006.00123, arXiv.org.
Javier Vidal-García & Marta Vidal & Sabri Boubaker & Riadh Manita, 2019. "Idiosyncratic risk and mutual fund performance," Annals of Operations Research, Springer, vol. 281(1), pages 349-372, October.
Agapova, Anna & Kaprielyan, Margarita, 2023. "Diversification measures: Mutual fund family case," International Review of Financial Analysis, Elsevier, vol. 90(C).
repec:dgr:rugsom:01e17 is not listed on IDEAS
Jerinsh Jeyapaulraj & Dhruv Desai & Peter Chu & Dhagash Mehta & Stefano Pasquali & Philip Sommer, 2022. "Supervised similarity learning for corporate bonds using Random Forest proximities," Papers 2207.04368, arXiv.org, revised Oct 2022.
Zheng-Guo, Maiko & Hernández-Ramírez, Manrique & Solís, Martín, 2023. "How to choose investments that match your needs? A proposal for the categorization of mutual funds for Latin American emerging markets, case of Costa Rica," Revista de Ciencias Económicas, Instituto de Investigaciones en Ciencias Económicas, Universidad de Costa Rica, vol. 41(1), December.
DeMiguel, Victor & Gil-Bazo, Javier & Nogales, Francisco J. & Santos, André A.P., 2023. "Machine learning and fund characteristics help to select mutual funds with positive alpha," Journal of Financial Economics, Elsevier, vol. 150(3).
Laura Fabregat-Aibar & Maria-Teresa Sorrosal-Forradellas & Glòria Barberà-Mariné & Antonio Terceño, 2021. "Can Artificial Neural Networks Predict the Survival Capacity of Mutual Funds? Evidence from Spain," Mathematics, MDPI, vol. 9(6), pages 1-10, March.
Francesco Lisi, 2011. "Dicing with the market: randomized procedures for evaluation of mutual funds," Quantitative Finance, Taylor & Francis Journals, vol. 11(2), pages 163-172.
Emmanuel Jurczenko & Bertrand Maillet & Paul Merlin, 2008. "Efficient Frontier for Robust Higher-order Moment Portfolio Selection," Post-Print halshs-00336475, HAL.
repec:onb:oenbwp:y:2005:i:9:b:1 is not listed on IDEAS
Rian Dolphin & Barry Smyth & Ruihai Dong, 2022. "A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets," Papers 2211.06378, arXiv.org.
Laurens Swinkels & Pieter Van Der Sluis, 2006. "Return-based style analysis with time-varying exposures," The European Journal of Finance, Taylor & Francis Journals, vol. 12(6-7), pages 529-552.
- Laurens Swinkels, Pieter Jelle VanDerSluis, 2001. "Return-based Style Analysis with Time-varying Exposures," Computing in Economics and Finance 2001 125, Society for Computational Economics.
- Swinkels, L.A.P. & van der Sluis, P.J., 2001. "Return-Based Style Analysis with Time-Varying Exposures," Discussion Paper 2001-96, Tilburg University, Center for Economic Research.
- Swinkels, L.A.P. & van der Sluis, P.J., 2001. "Return-Based Style Analysis with Time-Varying Exposures," Other publications TiSEM f2c16530-4d18-4f43-bb6d-f, Tilburg University, School of Economics and Management.
Yi, Li & Xiao, Li & Liao, Yinkai, 2024. "Network centrality, style drift, and mutual fund performance," Research in International Business and Finance, Elsevier, vol. 70(PA).
Bhaskarjit Sarmah & Nayana Nair & Dhagash Mehta & Stefano Pasquali, 2022. "Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning," Papers 2207.07183, arXiv.org.
Konstantina Pendaraki & Michael Doumpos & Constantin Zopounidis, 2003. "Assessing Equity Mutual Funds' Performance Using a Multicriteria Methodology: A Comparative Analysis," South-Eastern Europe Journal of Economics, Association of Economic Universities of South and Eastern Europe and the Black Sea Region, vol. 1(1), pages 85-104.
Mostafa, Mohamed M. & Nataraajan, Rajan, 2009. "A neuro-computational intelligence analysis of the ecological footprint of nations," Computational Statistics & Data Analysis, Elsevier, vol. 53(9), pages 3516-3531, July.
Yunmi Kim & Douglas Stone & Tae-Hwan Kim, 2021. "Testing for structural breaks in return-based style regression models," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 35(1), pages 61-76, March.
- Yunmi Kim & Douglas Stone & Tae-Hwan Kim, 2020. "Testing for Structural Breaks in Return-Based Style Regression Models," Working papers 2020rwp-165, Yonsei University, Yonsei Economics Research Institute.
Stavrou, Eleni T. & Charalambous, Christakis & Spiliotis, Stelios, 2007. "Human resource management and performance: A neural network analysis," European Journal of Operational Research, Elsevier, vol. 181(1), pages 453-467, August.
Nathalia Castellanos & Dhruv Desai & Sebastian Frank & Stefano Pasquali & Dhagash Mehta, 2024. "Can an unsupervised clustering algorithm reproduce a categorization system?," Papers 2408.10340, arXiv.org.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2022-08-22 (Big Data)
NEP-CMP-2022-08-22 (Computational Economics)
NEP-FMK-2022-08-22 (Financial Markets)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.04959. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Learning Mutual Fund Categorization using Natural Language Processing

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data