IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2112.07985.html
   My bibliography  Save this paper

Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods

Author

Listed:
  • Dafei Yin
  • Jing Li
  • Gaosheng Wu

Abstract

Predicting the success of startup companies is of great importance for both startup companies and investors. It is difficult due to the lack of available data and appropriate general methods. With data platforms like Crunchbase aggregating the information of startup companies, it is possible to predict with machine learning algorithms. Existing research suffers from the data sparsity problem as most early-stage startup companies do not have much data available to the public. We try to leverage the recent algorithms to solve this problem. We investigate several machine learning algorithms with a large dataset from Crunchbase. The results suggest that LightGBM and XGBoost perform best and achieve 53.03% and 52.96% F1 scores. We interpret the predictions from the perspective of feature contribution. We construct portfolios based on the models and achieve high success rates. These findings have substantial implications on how machine learning methods can help startup companies and investors.

Suggested Citation

  • Dafei Yin & Jing Li & Gaosheng Wu, 2021. "Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods," Papers 2112.07985, arXiv.org.
  • Handle: RePEc:arx:papers:2112.07985
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2112.07985
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jenkins, Anna S. & Wiklund, Johan & Brundin, Ethel, 2014. "Individual responses to firm failure: Appraisals, grief, and the influence of prior failure experience," Journal of Business Venturing, Elsevier, vol. 29(1), pages 17-33.
    2. Kaiser, Ulrich & Kuhn, Johan M., 2020. "The value of publicly available, textual and non-textual information for startup performance prediction," Journal of Business Venturing Insights, Elsevier, vol. 14(C).
    3. Mckenzie,David J. & Sansone,Dario & Mckenzie,David J. & Sansone,Dario, 2017. "Man vs. machine in predicting successful entrepreneurs : evidence from a business plan competition in Nigeria," Policy Research Working Paper Series 8271, The World Bank.
    4. Kaloyan Haralampiev & Boyan Yankov & Petko Ruskov, 2014. "Models and Tools for Technology Start-Up Companies Success Analysis," Economic Alternatives, University of National and World Economy, Sofia, Bulgaria, issue 3, pages 15-24, October.
    5. Nahata, Rajarishi, 2008. "Venture capital reputation and investment performance," Journal of Financial Economics, Elsevier, vol. 90(2), pages 127-151, November.
    6. Clarysse, Bart & Tartari, Valentina & Salter, Ammon, 2011. "The impact of entrepreneurial capacity, experience and organizational support on academic entrepreneurship," Research Policy, Elsevier, vol. 40(8), pages 1084-1093, October.
    7. Chandler, Gaylen N. & Hanks, Steven H., 1993. "Measuring the performance of emerging businesses: A validation study," Journal of Business Venturing, Elsevier, vol. 8(5), pages 391-408, September.
    8. Srinivasan Ragothaman & Bijayananda Naik & Kumoli Ramakrishnan, 2003. "Predicting Corporate Acquisitions: An Application of Uncertain Reasoning Using Rule Induction," Information Systems Frontiers, Springer, vol. 5(4), pages 401-412, December.
    9. Nanda, Ramana & Samila, Sampsa & Sorenson, Olav, 2020. "The persistent effect of initial success: Evidence from venture capital," Journal of Financial Economics, Elsevier, vol. 137(1), pages 231-248.
    10. Kaiser, Ulrich & Kuhn, Johan Moritz, 2020. "Value of Publicly Available, Textual and Non-textuThe al Information for Startup Performance Prediction," IZA Discussion Papers 13029, Institute of Labor Economics (IZA).
    11. Charles E. Eesley & David H. Hsu & Edward B. Roberts, 2014. "The contingent effects of top management teams on venture performance: Aligning founding team composition with innovation strategy and commercialization environment," Strategic Management Journal, Wiley Blackwell, vol. 35(12), pages 1798-1817, December.
    12. P. Holmes & A. Hunt & I. Stone, 2010. "An analysis of new firm survival using a hazard function," Applied Economics, Taylor & Francis Journals, vol. 42(2), pages 185-195.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lele Cao & Vilhelm von Ehrenheim & Sebastian Krakowski & Xiaoxue Li & Alexandra Lutz, 2022. "Using Deep Learning to Find the Next Unicorn: A Practical Synthesis," Papers 2210.14195, arXiv.org, revised Jun 2024.
    2. Lele Cao & Gustaf Halvardsson & Andrew McCornack & Vilhelm von Ehrenheim & Pawel Herman, 2023. "Beyond Gut Feel: Using Time Series Transformers to Find Investment Gems," Papers 2309.16888, arXiv.org, revised Jun 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicola Breugst & Holger Patzelt & Dean A. Shepherd, 2020. "When is Effort Contagious in New Venture Management Teams? Understanding the Contingencies of Social Motivation Theory," Journal of Management Studies, Wiley Blackwell, vol. 57(8), pages 1556-1588, December.
    2. Ronald Setty & Yuval Elovici & Dafna Schwartz, 2024. "Costā€sensitive machine learning to support startup investment decisions," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(1), March.
    3. Kaiser, Ulrich & Kuhn, Johan M., 2020. "The value of publicly available, textual and non-textual information for startup performance prediction," Journal of Business Venturing Insights, Elsevier, vol. 14(C).
    4. Khelil, Nabil, 2016. "The many faces of entrepreneurial failure: Insights from an empirical taxonomy," Journal of Business Venturing, Elsevier, vol. 31(1), pages 72-94.
    5. Soetanto, Danny & van Geenhuizen, Marina, 2019. "Life after incubation: The impact of entrepreneurial universities on the long-term performance of their spin-offs," Technological Forecasting and Social Change, Elsevier, vol. 141(C), pages 263-276.
    6. Giang Nguyen & My Nguyen & Anh Viet Pham & Man Duy Marty Pham, 2023. "Navigating investment decisions with social connectedness : Implications for venture capital," Post-Print hal-04325756, HAL.
    7. Milosevic, Miona, 2018. "Skills or networks? Success and fundraising determinants in a low performing venture capital market," Research Policy, Elsevier, vol. 47(1), pages 49-60.
    8. Nguyen, Giang & Nguyen, My & Pham, Anh Viet & Pham, Man Duy (Marty), 2023. "Navigating investment decisions with social connectedness: Implications for venture capital," Journal of Banking & Finance, Elsevier, vol. 155(C).
    9. Thomas, V.J. & Bliemel, Martin & Shippam, Cynthia & Maine, Elicia, 2020. "Endowing university spin-offs pre-formation: Entrepreneurial capabilities for scientist-entrepreneurs," Technovation, Elsevier, vol. 96.
    10. Ge, Guoqing & Xue, Jian & Zhang, Qian, 2024. "Industrial policy and governmental venture capital: Evidence from China," Journal of Corporate Finance, Elsevier, vol. 84(C).
    11. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.
    12. Kim, Jongwoo & Kim, Hongil & Geum, Youngjung, 2023. "How to succeed in the market? Predicting startup success using a machine learning approach," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    13. Lele Cao & Vilhelm von Ehrenheim & Sebastian Krakowski & Xiaoxue Li & Alexandra Lutz, 2022. "Using Deep Learning to Find the Next Unicorn: A Practical Synthesis," Papers 2210.14195, arXiv.org, revised Jun 2024.
    14. Hui Zhang & Yuan Mo & Dong Wang, 2021. "Why do some academic entrepreneurs experience less role conflict? The impact of prior academic experience and prior entrepreneurial experience," International Entrepreneurship and Management Journal, Springer, vol. 17(4), pages 1521-1539, December.
    15. Kaiser, Ulrich & Kuhn, Johan Moritz, 2020. "Value of Publicly Available, Textual and Non-textuThe al Information for Startup Performance Prediction," IZA Discussion Papers 13029, Institute of Labor Economics (IZA).
    16. Battaglia, Daniele & Paolucci, Emilio & Ughetto, Elisa, 2021. "Opening the black box of university Proof-of-Concept programs: Project and team-based determinants of research commercialization outcomes," Technovation, Elsevier, vol. 108(C).
    17. Cumming, Douglas J. & Nguyen, Giang & Nguyen, My, 2022. "Product market competition, venture capital, and the success of entrepreneurial firms," Journal of Banking & Finance, Elsevier, vol. 144(C).
    18. Nguyen, Giang & Vo, Vinh, 2021. "Asset liquidity and venture capital investment," Journal of Corporate Finance, Elsevier, vol. 69(C).
    19. Nguyen, Giang & Vu, Le, 2021. "Does venture capital syndication affect mergers and acquisitions?," Journal of Corporate Finance, Elsevier, vol. 67(C).
    20. Colombo, Massimo G. & Guerini, Massimiliano & Hoisl, Karin & Zeiner, Nico M., 2023. "The dark side of signals: Patents protecting radical inventions and venture capital investments," Research Policy, Elsevier, vol. 52(5).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2112.07985. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.