IDEAS home Printed from https://ideas.repec.org/a/eee/finana/v89y2023ics1057521923002715.html
   My bibliography  Save this article

A two-stage credit scoring model based on random forest: Evidence from Chinese small firms

Author

Listed:
  • Zhou, Ying
  • Shen, Long
  • Ballester, Laura

Abstract

Small firms are major contributors to most economies, often supported by government policies. However, the credit scoring of small firms is complicated and costly, making it a challenging field of research. Using loan data from 3045 small firms in China, we design a two-stage expert system for default prediction that quantifies the variables and thresholds that have a key impact. Firstly, we use SMOTE to deal with the imbalanced data and secondly, we employ random forest to build predictive credit features. Dominance analysis shows that, when making default assessments on Chinese small firms, it is important to consider not only financial factors, but also non-financial and macroeconomic factors. In particular, the net cash profit, the firm's legal disputes and the per capita disposable income of urban residents are key factors in credit scoring. Robustness tests show that our proposed methodology performs better than other machine learning models, and this result is robust with observations from other countries.

Suggested Citation

  • Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
  • Handle: RePEc:eee:finana:v:89:y:2023:i:c:s1057521923002715
    DOI: 10.1016/j.irfa.2023.102755
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1057521923002715
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.irfa.2023.102755?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cathcart, Lara & Dufour, Alfonso & Rossi, Ludovico & Varotto, Simone, 2020. "The differential impact of leverage on the default risk of small and large firms," Journal of Corporate Finance, Elsevier, vol. 60(C).
    2. Tristan Boyer & Régis Blazy, 2014. "Born to be alive? The survival of innovative and non-innovative French micro-start-ups," Small Business Economics, Springer, vol. 42(4), pages 669-683, April.
    3. Rajkamal Iyer & Asim Ijaz Khwaja & Erzo F. P. Luttmer & Kelly Shue, 2016. "Screening Peers Softly: Inferring the Quality of Small Borrowers," Management Science, INFORMS, vol. 62(6), pages 1554-1577, June.
    4. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    5. Voulgaris, Fotini & Doumpos, Michael & Zopounidis, Constantin, 2000. "On the Evaluation of Greek Industrial SMEs' Performance via Multicriteria Analysis of Financial Ratios," Small Business Economics, Springer, vol. 15(2), pages 127-136, September.
    6. Nikolaos Sariannidis & Stelios Papadakis & Alexandros Garefalakis & Christos Lemonakis & Tsioptsia Kyriaki-Argyro, 2020. "Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: decision making based on machine learning (ML) techniques," Annals of Operations Research, Springer, vol. 294(1), pages 715-739, November.
    7. Valentina Bruno & Jess Cornaggia & Kimberly J. Cornaggia, 2016. "Does Regulatory Certification Affect the Information Content of Credit Ratings?," Management Science, INFORMS, vol. 62(6), pages 1578-1597, June.
    8. Chiungfeng Ko & Picheng Lee & Asokan Anandarajan, 2019. "The impact of operational risk incidents and moderating influence of corporate governance on credit risk and firm performance," International Journal of Accounting & Information Management, Emerald Group Publishing Limited, vol. 27(1), pages 96-110, March.
    9. T Bellotti & J Crook, 2009. "Credit scoring with macroeconomic variables using survival analysis," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(12), pages 1699-1707, December.
    10. Carling, Kenneth & Jacobson, Tor & Linde, Jesper & Roszbach, Kasper, 2007. "Corporate credit risk modeling and the macroeconomy," Journal of Banking & Finance, Elsevier, vol. 31(3), pages 845-868, March.
    11. Zhu, Weidong & Zhang, Tianjiao & Wu, Yong & Li, Shaorong & Li, Zhimin, 2022. "Research on optimization of an enterprise financial risk early warning method based on the DS-RF model," International Review of Financial Analysis, Elsevier, vol. 81(C).
    12. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    13. Raffaella Calabrese & Giampiero Marra & Silvia Angela Osmetti, 2016. "Bankruptcy prediction of small and medium enterprises using a flexible binary generalized extreme value model," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 67(4), pages 604-615, April.
    14. Alonso-Robisco, Andrés & Carbó, José Manuel, 2022. "Can machine learning models save capital for banks? Evidence from a Spanish credit portfolio," International Review of Financial Analysis, Elsevier, vol. 84(C).
    15. Mohammad S. Uddin & Guotai Chi & Mazin A. M. Al Janabi & Tabassum Habib, 2022. "Leveraging random forest in micro‐enterprises credit risk modelling for accuracy and interpretability," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 27(3), pages 3713-3729, July.
    16. Su, Liangjun & Hoshino, Tadao, 2016. "Sieve instrumental variable quantile regression estimation of functional coefficient models," Journal of Econometrics, Elsevier, vol. 191(1), pages 231-254.
    17. Yu, Lean & Yao, Xiao & Zhang, Xiaoming & Yin, Hang & Liu, Jia, 2020. "A novel dual-weighted fuzzy proximal support vector machine with application to credit risk analysis," International Review of Financial Analysis, Elsevier, vol. 71(C).
    18. Bart Baesens & Rudy Setiono & Christophe Mues & Jan Vanthienen, 2003. "Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation," Management Science, INFORMS, vol. 49(3), pages 312-329, March.
    19. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    20. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    21. Edward I. Altman & Gabriele Sabato, 2013. "MODELING CREDIT RISK FOR SMEs: EVIDENCE FROM THE US MARKET," World Scientific Book Chapters, in: Oliviero Roggi & Edward I Altman (ed.), Managing and Measuring Risk Emerging Global Standards and Regulations After the Financial Crisis, chapter 9, pages 251-279, World Scientific Publishing Co. Pte. Ltd..
    22. Vicente Salas & Jesús Saurina, 2002. "Credit Risk in Two Institutional Regimes: Spanish Commercial and Savings Banks," Journal of Financial Services Research, Springer;Western Finance Association, vol. 22(3), pages 203-224, December.
    23. Salima Smiti & Makram Soui, 2020. "Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE," Information Systems Frontiers, Springer, vol. 22(5), pages 1067-1083, October.
    24. Liu, Yi & Yang, Menglong & Wang, Yudong & Li, Yongshan & Xiong, Tiancheng & Li, Anzhe, 2022. "Applying machine learning algorithms to predict default probability in the online credit market: Evidence from China," International Review of Financial Analysis, Elsevier, vol. 79(C).
    25. Ebrahimi Shahabadi, Mohammad Saleh & Tabrizchi, Hamed & Kuchaki Rafsanjani, Marjan & Gupta, B.B. & Palmieri, Francesco, 2021. "A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
    26. Audrino, Francesco & Kostrov, Alexander & Ortega, Juan-Pablo, 2019. "Predicting U.S. Bank Failures with MIDAS Logit Models," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 54(6), pages 2575-2603, December.
    27. Tian, Shaonan & Yu, Yan & Guo, Hui, 2015. "Variable selection and corporate bankruptcy forecasts," Journal of Banking & Finance, Elsevier, vol. 52(C), pages 89-100.
    28. Francesco Ciampi & Valentina Cillo & Fabio Fiano, 2020. "Combining Kohonen maps and prior payment behavior for small enterprise default prediction," Small Business Economics, Springer, vol. 54(4), pages 1007-1039, April.
    29. Witold J. Henisz & James McGlinch, 2019. "ESG, Material Credit Events, and Credit Risk," Journal of Applied Corporate Finance, Morgan Stanley, vol. 31(2), pages 105-117, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Indu Singh & D. P. Kothari & S. Aditya & Mihir Rajora & Charu Agarwal & Vibhor Gautam, 2024. "A hybrid metaheuristic optimised ensemble classifier with self organizing map clustering for credit scoring," Operational Research, Springer, vol. 24(4), pages 1-42, December.
    2. He, Yinan & Wu, Chao & Fan, Yuanyuan, 2024. "Exploring the drivers of local government budget coordination: A random forest regression analysis," International Review of Economics & Finance, Elsevier, vol. 93(PA), pages 1104-1113.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ángel Beade & Manuel Rodríguez & José Santos, 2024. "Multiperiod Bankruptcy Prediction Models with Interpretable Single Models," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1357-1390, September.
    2. Enrique Batiz‐Zuk & Fabrizio López‐Gallo & Abdulkadir Mohamed & Fátima Sánchez‐Cajal, 2022. "Determinants of loan survival rates for small and medium‐sized enterprises: Evidence from an emerging economy," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 27(4), pages 4741-4755, October.
    3. Sigrist, Fabio & Leuenberger, Nicola, 2023. "Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities," European Journal of Operational Research, Elsevier, vol. 305(3), pages 1390-1406.
    4. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    5. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    6. Medina-Olivares, Victor & Calabrese, Raffaella & Dong, Yizhe & Shi, Baofeng, 2022. "Spatial dependence in microfinance credit default," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1071-1085.
    7. Alessandro Bitetto & Stefano Filomeni & Michele Modina, 2021. "Understanding corporate default using Random Forest: The role of accounting and market information," DEM Working Papers Series 205, University of Pavia, Department of Economics and Management.
    8. Yu Zhao & Huaming Du & Qing Li & Fuzhen Zhuang & Ji Liu & Gang Kou, 2022. "A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data Perspective," Papers 2211.14997, arXiv.org, revised May 2023.
    9. Sigrist, Fabio & Hirnschall, Christoph, 2019. "Grabit: Gradient tree-boosted Tobit models for default prediction," Journal of Banking & Finance, Elsevier, vol. 102(C), pages 177-192.
    10. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    11. Modina, Michele & Pietrovito, Filomena & Gallucci, Carmen & Formisano, Vincenzo, 2023. "Predicting SMEs’ default risk: Evidence from bank-firm relationship data," The Quarterly Review of Economics and Finance, Elsevier, vol. 89(C), pages 254-268.
    12. Serrano-Cinca, Carlos & Gutiérrez-Nieto, Begoña & Bernate-Valbuena, Martha, 2019. "The use of accounting anomalies indicators to predict business failure," European Management Journal, Elsevier, vol. 37(3), pages 353-375.
    13. Filipe, Sara Ferreira & Grammatikos, Theoharry & Michala, Dimitra, 2016. "Forecasting distress in European SME portfolios," Journal of Banking & Finance, Elsevier, vol. 64(C), pages 112-135.
    14. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    15. Goldmann, Leonie & Crook, Jonathan & Calabrese, Raffaella, 2024. "A new ordinal mixed-data sampling model with an application to corporate credit rating levels," European Journal of Operational Research, Elsevier, vol. 314(3), pages 1111-1126.
    16. Cao Son Tran & Dan Nicolau & Richi Nayak & Peter Verhoeven, 2021. "Modeling Credit Risk: A Category Theory Perspective," JRFM, MDPI, vol. 14(7), pages 1-21, July.
    17. Tang, Lingxiao & Cai, Fei & Ouyang, Yao, 2019. "Applying a nonparametric random forest algorithm to assess the credit risk of the energy industry in China," Technological Forecasting and Social Change, Elsevier, vol. 144(C), pages 563-572.
    18. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    19. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    20. Edward I. Altman & Marco Balzano & Alessandro Giannozzi & Stjepan Srhoj, 2023. "Revisiting SME default predictors: The Omega Score," Journal of Small Business Management, Taylor & Francis Journals, vol. 61(6), pages 2383-2417, November.

    More about this item

    Keywords

    Credit scoring; Small firms; Expert system; Dominance analysis;
    All these keywords.

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
    • C61 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Optimization Techniques; Programming Models; Dynamic Analysis
    • G15 - Financial Economics - - General Financial Markets - - - International Financial Markets
    • G32 - Financial Economics - - Corporate Finance and Governance - - - Financing Policy; Financial Risk and Risk Management; Capital and Ownership Structure; Value of Firms; Goodwill

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:finana:v:89:y:2023:i:c:s1057521923002715. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/620166 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.