IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i5p701-d1347551.html
   My bibliography  Save this article

Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction

Author

Listed:
  • Zixue Zhao

    (School of Statistics and Mathematics, Yunnan University of Finance and Economics, No. 237, LongQuan Rd., Kunming 650221, China)

  • Tianxiang Cui

    (School of Computer Science, University of Nottingham Ningbo China, Ningbo 315100, China)

  • Shusheng Ding

    (School of Business, Ningbo University, 818 Fenghua Road Ningbo, Ningbo 315211, China)

  • Jiawei Li

    (School of Computer Science, University of Nottingham Ningbo China, Ningbo 315100, China)

  • Anthony Graham Bellotti

    (School of Computer Science, University of Nottingham Ningbo China, Ningbo 315100, China)

Abstract

Credit risk prediction heavily relies on historical data provided by financial institutions. The goal is to identify commonalities among defaulting users based on existing information. However, data on defaulters is often limited, leading to a concentration of credit data where positive samples (defaults) are significantly fewer than negative samples (nondefaults). It poses a serious challenge known as the class imbalance problem, which can substantially impact data quality and predictive model effectiveness. To address the problem, various resampling techniques have been proposed and studied extensively. However, despite ongoing research, there is no consensus on the most effective technique. The choice of resampling technique is closely related to the dataset size and imbalance ratio, and its effectiveness varies across different classifiers. Moreover, there is a notable gap in research concerning suitable techniques for extremely imbalanced datasets. Therefore, this study aims to compare popular resampling techniques across different datasets and classifiers while also proposing a novel hybrid sampling method tailored for extremely imbalanced datasets. Our experimental results demonstrate that this new technique significantly enhances classifier predictive performance, shedding light on effective strategies for managing the class imbalance problem in credit risk prediction.

Suggested Citation

  • Zixue Zhao & Tianxiang Cui & Shusheng Ding & Jiawei Li & Anthony Graham Bellotti, 2024. "Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction," Mathematics, MDPI, vol. 12(5), pages 1-27, February.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:701-:d:1347551
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/5/701/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/5/701/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrés Alonso & José Manuel Carbó, 2020. "Machine learning in credit risk: measuring the dilemma between prediction and supervisory cost," Working Papers 2032, Banco de España.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wosnitza, Jan Henrik, 2022. "Calibration alternatives to logistic regression and their potential for transferring the dispersion of discriminatory power into uncertainties of probabilities of default," Discussion Papers 04/2022, Deutsche Bundesbank.
    2. Faraz Ahmed & Kehkashan Nizam & Zubair Sajid & Sunain Qamar & Ahsan, 2024. "Striking a Balance: Evaluating Credit Risk with Traditional and Machine Learning Models," Bulletin of Business and Economics (BBE), Research Foundation for Humanity (RFH), vol. 13(3), pages 30-35.
    3. Pedro Guerra & Mauro Castelli & Nadine Côrte-Real, 2022. "Approaching European Supervisory Risk Assessment with SupTech: A Proposal of an Early Warning System," Risks, MDPI, vol. 10(4), pages 1-23, March.
    4. Andrés Alonso & José Manuel Carbó, 2021. "Understanding the performance of machine learning models to predict credit default: a novel approach for supervisory evaluation," Working Papers 2105, Banco de España.
    5. Valter T. Yoshida Jr & Alan de Genaro & Rafael Schiozer & Toni R. E. dos Santos, 2023. "A Novel Credit Model Risk Measure: does more data lead to lower model risk in credit scoring models?," Working Papers Series 582, Central Bank of Brazil, Research Department.
    6. Dimitrios Nikolaidis & Michalis Doumpos, 2022. "Credit Scoring with Drift Adaptation Using Local Regions of Competence," SN Operations Research Forum, Springer, vol. 3(4), pages 1-28, December.
    7. Pedro Guerra & Mauro Castelli, 2021. "Machine Learning Applied to Banking Supervision a Literature Review," Risks, MDPI, vol. 9(7), pages 1-24, July.
    8. Giuseppe Cascarino & Mirko Moscatelli & Fabio Parlapiano, 2022. "Explainable Artificial Intelligence: interpreting default forecasting models based on Machine Learning," Questioni di Economia e Finanza (Occasional Papers) 674, Bank of Italy, Economic Research and International Relations Area.
    9. Lisa Crosato & Caterina Liberati & Marco Repetto, 2021. "Look Who's Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default," Papers 2108.13914, arXiv.org, revised Sep 2021.
    10. Citterio, Alberto, 2024. "Bank failure prediction models: Review and outlook," Socio-Economic Planning Sciences, Elsevier, vol. 92(C).
    11. Antonietta di Salvatore & Mirko Moscatelli, 2024. "Improving survey information on household debt using granular credit databases," Questioni di Economia e Finanza (Occasional Papers) 839, Bank of Italy, Economic Research and International Relations Area.
    12. Andrés Alonso & José Manuel Carbó, 2022. "Accuracy of explanations of machine learning models for credit decisions," Working Papers 2222, Banco de España.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:701-:d:1347551. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.