IDEAS home Printed from https://ideas.repec.org/a/gam/jrisks/v10y2022i9p169-d895806.html
   My bibliography  Save this article

Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?

Author

Listed:
  • Ahmed Almustfa Hussin Adam Khatir

    (Department of Economics and Management, University of Trento, Via Inama 5, 38122 Trento, Italy)

  • Marco Bee

    (Department of Economics and Management, University of Trento, Via Inama 5, 38122 Trento, Italy)

Abstract

Forecasting the creditworthiness of customers is a central issue of banking activity. This task requires the analysis of large datasets with many variables, for which machine learning algorithms and feature selection techniques are a crucial tool. Moreover, the percentages of “good” and “bad” customers are typically imbalanced such that over- and undersampling techniques should be employed. In the literature, most investigations tackle these three issues individually. Since there is little evidence about their joint performance, in this paper, we try to fill this gap. We use five machine learning classifiers, and each of them is combined with different feature selection techniques and various data-balancing approaches. According to the empirical analysis of a retail credit bank dataset, we find that the best combination is given by random forests, random forest recursive feature elimination and random oversampling.

Suggested Citation

  • Ahmed Almustfa Hussin Adam Khatir & Marco Bee, 2022. "Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?," Risks, MDPI, vol. 10(9), pages 1-22, August.
  • Handle: RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-9091/10/9/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-9091/10/9/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    2. Desai, Vijay S. & Crook, Jonathan N. & Overstreet, George A., 1996. "A comparison of neural networks and linear scoring models in the credit union environment," European Journal of Operational Research, Elsevier, vol. 95(1), pages 24-37, November.
    3. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    4. Martin Leo & Suneel Sharma & K. Maddulety, 2019. "Machine Learning in Banking Risk Management: A Literature Review," Risks, MDPI, vol. 7(1), pages 1-22, March.
    5. Reichert, Alan K & Cho, Chien-Ching & Wagner, George M, 1983. "An Examination of the Conceptual Issues Involved in Developing Credit-scoring Models," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(2), pages 101-114, April.
    6. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    7. K B Schebesch & R Stecking, 2005. "Support vector machines for classifying and describing credit applicants: detecting typical and critical regions," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1082-1088, September.
    8. Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Abdussalam Aljadani & Bshair Alharthi & Mohammed A. Farsi & Hossam Magdy Balaha & Mahmoud Badawy & Mostafa A. Elhosseini, 2023. "Mathematical Modeling and Analysis of Credit Scoring Using the LIME Explainer: A Comprehensive Approach," Mathematics, MDPI, vol. 11(19), pages 1-28, September.
    2. Flavio Bazzana & Marco Bee & Ahmed Almustfa Hussin Adam Khatir, 2024. "Machine learning techniques for default prediction: an application to small Italian companies," Risk Management, Palgrave Macmillan, vol. 26(1), pages 1-23, February.
    3. Faraz Ahmed & Kehkashan Nizam & Zubair Sajid & Sunain Qamar & Ahsan, 2024. "Striking a Balance: Evaluating Credit Risk with Traditional and Machine Learning Models," Bulletin of Business and Economics (BBE), Research Foundation for Humanity (RFH), vol. 13(3), pages 30-35.
    4. Luis J. Mena & Vicente García & Vanessa G. Félix & Rodolfo Ostos & Rafael Martínez-Peláez & Alberto Ochoa-Brust & Pablo Velarde-Alvarado, 2024. "Enhancing financial risk prediction with symbolic classifiers: addressing class imbalance and the accuracy–interpretability trade–off," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Linhui Wang & Jianping Zhu & Chenlu Zheng & Zhiyuan Zhang, 2024. "Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging," Mathematics, MDPI, vol. 12(18), pages 1-15, September.
    2. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    3. G Verstraeten & D Van den Poel, 2005. "The impact of sample bias on consumer credit scoring performance and profitability," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(8), pages 981-992, August.
    4. Pérez-Martín, A. & Pérez-Torregrosa, A. & Vaca, M., 2018. "Big Data techniques to measure credit banking risk in home equity loans," Journal of Business Research, Elsevier, vol. 89(C), pages 448-454.
    5. Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
    6. Hong Wang & Qingsong Xu & Lifeng Zhou, 2015. "Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-20, February.
    7. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    8. Dinh, Thi Huyen Thanh & Kleimeier, Stefanie, 2007. "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, Elsevier, vol. 16(5), pages 471-495.
    9. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    10. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Nydia M. Reyes, 2013. "A Social Approach to Microfinance Credit Scoring," Working Papers CEB 13-013, ULB -- Universite Libre de Bruxelles.
    11. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    12. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    13. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    14. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    15. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    16. Andreea Costea, 2017. "A Quantitative Approach to Credit Risk Management in the Underwriting Process for the Retail Portfolio," Romanian Economic Journal, Department of International Business and Economics from the Academy of Economic Studies Bucharest, vol. 20(63), pages 157-186, March.
    17. Brad S. Trinkle & Amelia A. Baldwin, 2007. "Interpretable credit model development via artificial neural networks," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 15(3‐4), pages 123-147, July.
    18. L C Thomas, 2010. "Consumer finance: challenges for operational research," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(1), pages 41-52, January.
    19. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    20. Robert Till & David Hand, 2003. "Behavioural models of credit card usage," Journal of Applied Statistics, Taylor & Francis Journals, vol. 30(10), pages 1201-1220.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.