IDEAS home Printed from https://ideas.repec.org/a/wly/jforec/v43y2024i2p286-308.html
   My bibliography  Save this article

Credit scoring prediction leveraging interpretable ensemble learning

Author

Listed:
  • Yang Liu
  • Fei Huang
  • Lili Ma
  • Qingguo Zeng
  • Jiale Shi

Abstract

Credit scoring models based on machine learning often need to work on accuracy and interpretability in practical applications. Original KCDWU has a more prominent adaptive property but ignores intra‐class and inter‐class distances in the clustering process, resulting in the possibility of inaccurate identification of class features and cluster structure of data, which compromises the clustering effect. Therefore, we improve the automatic K‐means clustering based on the Calinski–Harabasz index, thus achieving a clustering output for improved results. We also scrutinize representative five single classification models and six ensemble learning models for credit scoring prediction. We empirically test the superior performance of ensemble learning models and identify the best model CatBoost by comparing them based on multiple evaluation indicators. Empirical results reveal that the SHAP method conforms well to CatBoost and delivers a global and local interpretation of the predictions. This work provides financial institutions with a promising candidate for interpretable credit scoring models.

Suggested Citation

  • Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
  • Handle: RePEc:wly:jforec:v:43:y:2024:i:2:p:286-308
    DOI: 10.1002/for.3033
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/for.3033
    Download Restriction: no

    File URL: https://libkey.io/10.1002/for.3033?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    2. Yiheng Li & Weidong Chen, 2020. "A Comparative Performance Assessment of Ensemble Learning for Credit Scoring," Mathematics, MDPI, vol. 8(10), pages 1-19, October.
    3. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    4. Raffaella Calabrese & Paolo Giudici, 2015. "Estimating bank default with generalised extreme value regression models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 66(11), pages 1783-1792, November.
    5. Yu, Baojun & Li, Changming & Mirza, Nawazish & Umar, Muhammad, 2022. "Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    6. Michael Bücker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2022. "Transparency, auditability, and explainability of machine learning models in credit scoring," Journal of the Operational Research Society, Taylor & Francis Journals, vol. 73(1), pages 70-90, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Dangxing & Ye, Jiahui & Ye, Weicheng, 2023. "Interpretable selective learning in credit risk," Research in International Business and Finance, Elsevier, vol. 65(C).
    2. Yang, Fan & Abedin, Mohammad Zoynul & Hajek, Petr, 2024. "An explainable federated learning and blockchain-based secure credit modeling method," European Journal of Operational Research, Elsevier, vol. 317(2), pages 449-467.
    3. Babaei, Golnoosh & Giudici, Paolo & Raffinetti, Emanuela, 2023. "Explainable FinTech lending," Journal of Economics and Business, Elsevier, vol. 125.
    4. Jiaming Liu & Xuemei Zhang & Haitao Xiong, 2024. "Credit risk prediction based on causal machine learning: Bayesian network learning, default inference, and interpretation," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(5), pages 1625-1660, August.
    5. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
    6. Gero Szepannek, 2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling," Risks, MDPI, vol. 10(3), pages 1-33, March.
    7. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    8. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    9. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    10. Zhang, Lifeng & Chao, Xiangrui & Qian, Qian & Jing, Fuying, 2022. "Credit evaluation solutions for social groups with poor services in financial inclusion: A technical forecasting method," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    11. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    12. Silvia Facchinetti & Paolo Giudici & Silvia Angela Osmetti, 2020. "Cyber risk measurement with ordinal data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 173-185, March.
    13. Wang, Xiang & Yin, Jian & Yang, Yao & Muda, Iskandar & Abduvaxitovna, Shamansurova Zilola & AlWadi, Belal Mahmoud & Castillo-Picon, Jorge & Abdul-Samad, Zulkiflee, 2023. "Relationship between the resource curse, Forest management and sustainable development and the importance of R&D Projects," Resources Policy, Elsevier, vol. 85(PA).
    14. Yusheng Li & Mengyi Sha, 2024. "Two‐stage credit risk prediction framework based on three‐way decisions with automatic threshold learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(5), pages 1263-1277, August.
    15. Li, Shanshan & Long, Fang & Long, Litao, 2022. "Resources curse and sustainable development revisited: Evaluating the role of remittances for China," Resources Policy, Elsevier, vol. 79(C).
    16. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    17. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1-15, March.
    18. Anton Gerunov, 2023. "Modern Approaches To Forecasting Firm Default Rates Over The Short To Medium Term: An Application To A Panel Of Polish Companies," Yearbook of the Faculty of Economics and Business Administration, Sofia University, Faculty of Economics and Business Administration, Sofia University St Kliment Ohridski - Bulgaria, vol. 22(1), pages 5-15, October.
    19. Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.
    20. Zhang, Mingming & Wong, Wing-Keung & Kim Oanh, Thai Thi & Muda, Iskandar & Islam, Saiful & Hishan, Sanil S. & Abduvaxitovna, Shamansurova Zilola, 2023. "Regulating environmental pollution through natural resources and technology innovation: Revisiting the environment Kuznet curve in China through quantile-based ARDL estimations," Resources Policy, Elsevier, vol. 85(PA).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:jforec:v:43:y:2024:i:2:p:286-308. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www3.interscience.wiley.com/cgi-bin/jhome/2966 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.