IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i21p3423-d1511857.html
   My bibliography  Save this article

Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

Author

Listed:
  • Abisola Akinjole

    (School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK)

  • Olamilekan Shobayo

    (School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK)

  • Jumoke Popoola

    (School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK)

  • Obinna Okoyeigbo

    (Department of Engineering, Edge Hill University, Ormskirk L39 4QP, UK)

  • Bayode Ogunleye

    (Department of Computing & Mathematics, University of Brighton, Brighton BN2 4GJ, UK)

Abstract

Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.

Suggested Citation

  • Abisola Akinjole & Olamilekan Shobayo & Jumoke Popoola & Obinna Okoyeigbo & Bayode Ogunleye, 2024. "Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction," Mathematics, MDPI, vol. 12(21), pages 1-32, October.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3423-:d:1511857
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/21/3423/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/21/3423/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Duffie, Darrell, 2011. "Measuring Corporate Default Risk," OUP Catalogue, Oxford University Press, number 9780199279234.
    2. Fahmida E. Moula & Chi Guotai & Mohammad Zoynul Abedin, 2017. "Credit default prediction modeling: an application of support vector machine," Risk Management, Palgrave Macmillan, vol. 19(2), pages 158-187, May.
    3. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    4. Markus K. Brunnermeier, 2009. "Deciphering the Liquidity and Credit Crunch 2007-2008," Journal of Economic Perspectives, American Economic Association, vol. 23(1), pages 77-100, Winter.
    5. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    2. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    3. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    4. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    5. Jianhua Jiang & Xianqiu Meng & Yang Liu & Huan Wang, 2022. "An Enhanced TSA-MLP Model for Identifying Credit Default Problems," SAGE Open, , vol. 12(2), pages 21582440221, April.
    6. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    7. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
    8. Xiao, Jin & Zhong, Yu & Jia, Yanlin & Wang, Yadong & Li, Ruoyi & Jiang, Xiaoyi & Wang, Shouyang, 2024. "A novel deep ensemble model for imbalanced credit scoring in internet finance," International Journal of Forecasting, Elsevier, vol. 40(1), pages 348-372.
    9. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    10. Tomáš Vaněk & David Hampel, 2017. "The Probability of Default Under IFRS 9: Multi-period Estimation and Macroeconomic Forecast," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 65(2), pages 759-776.
    11. Durand, Pierre & Le Quang, Gaëtan, 2022. "Banks to basics! Why banking regulation should focus on equity," European Journal of Operational Research, Elsevier, vol. 301(1), pages 349-372.
    12. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    13. Jing Quan & Xuelian Sun, 2024. "Credit risk assessment using the factorization machine model with feature interactions," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    14. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    15. Kelly, Robert & O'Toole, Conor, 2016. "Lending Conditions and Loan Default: What Can We Learn From UK Buy-to-Let Loans?," Research Technical Papers 04/RT/16, Central Bank of Ireland.
    16. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    17. Hollander, Hylton & Liu, Guangling, 2016. "Credit spread variability in the U.S. business cycle: The Great Moderation versus the Great Recession," Journal of Banking & Finance, Elsevier, vol. 67(C), pages 37-52.
    18. Xin Huang & Hao Zhou & Haibin Zhu, 2012. "Systemic Risk Contributions," Journal of Financial Services Research, Springer;Western Finance Association, vol. 42(1), pages 55-83, October.
    19. Merrill, Craig B. & Nadauld, Taylor D. & Stulz, Rene M. & Sherlund, Shane, 2012. "Did Capital Requirements and Fair Value Accounting Spark Fire Sales in Distressed Mortgage-Backed Securities?," Working Papers 13-01, University of Pennsylvania, Wharton School, Weiss Center.
    20. Goedde-Menke, Michael & Langer, Thomas & Pfingsten, Andreas, 2014. "Impact of the financial crisis on bank run risk – Danger of the days after," Journal of Banking & Finance, Elsevier, vol. 40(C), pages 522-533.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3423-:d:1511857. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.