IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v11y2019i3p699-d201610.html
   My bibliography  Save this article

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Author

Listed:
  • Lkhagvadorj Munkhdalai

    (Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea)

  • Tsendsuren Munkhdalai

    (Microsoft Research, Montreal, QC H3A 3H3, Canada)

  • Oyun-Erdene Namsrai

    (Department of Information and Computer Sciences, National University of Mongolia, Sukhbaatar District, Building#3 Room#212, Ulaanbaatar 14201, Mongolia)

  • Jong Yun Lee

    (Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea)

  • Keun Ho Ryu

    (Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam)

Abstract

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Suggested Citation

  • Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
  • Handle: RePEc:gam:jsusta:v:11:y:2019:i:3:p:699-:d:201610
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/11/3/699/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/11/3/699/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kuhnert, Petra M. & Do, Kim-Anh & McClure, Rod, 2000. "Combining non-parametric models with logistic regression: an application to motor vehicle injury data," Computational Statistics & Data Analysis, Elsevier, vol. 34(3), pages 371-386, September.
    2. Y Liu & M Schumann, 2005. "Data mining feature selection for credit scoring models," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1099-1108, September.
    3. Jacobson, Tor & Roszbach, Kasper, 2003. "Bank lending policy, credit scoring and value-at-risk," Journal of Banking & Finance, Elsevier, vol. 27(4), pages 615-633, April.
    4. Rajdeep Sengupta & Geetesh Bhardwaj, 2015. "Credit Scoring and Loan Default," International Review of Finance, International Review of Finance Ltd., vol. 15(2), pages 139-167, June.
    5. Maria Felice Arezzo & Giuseppina Guagnano, 2018. "Response-Based Sampling for Binary Choice Models With Sample Selection," Econometrics, MDPI, vol. 6(1), pages 1-17, March.
    6. Paolo Giudici, 2001. "Bayesian data mining, with application to benchmarking and credit scoring," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 17(1), pages 69-81, January.
    7. Guangyou Zhou & Yijia Zhang & Sumei Luo, 2018. "P2P Network Lending, Loss Given Default and Credit Risks," Sustainability, MDPI, vol. 10(4), pages 1-15, March.
    8. Quan Chen & Sang-Bing Tsai & Yuming Zhai & Chien-Chi Chu & Jie Zhou & Guodong Li & Yuxiang Zheng & Jiangtao Wang & Li-Chung Chang & Chao-Feng Hsu, 2018. "An Empirical Research on Bank Client Credit Assessments," Sustainability, MDPI, vol. 10(5), pages 1-17, May.
    9. Koutanaei, Fatemeh Nemati & Sajedi, Hedieh & Khanbabaei, Mohammad, 2015. "A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring," Journal of Retailing and Consumer Services, Elsevier, vol. 27(C), pages 11-23.
    10. De Gooijer, Jan G. & Ray, Bonnie K. & Krager, Horst, 1998. "Forecasting exchange rates using TSMARS," Journal of International Money and Finance, Elsevier, vol. 17(3), pages 513-534, June.
    11. Geetesh Bhardwaj & Rajdeep Sengupta, 2011. "Credit scoring and loan default," Working Papers 2011-040, Federal Reserve Bank of St. Louis.
    12. Orgler, Yair E, 1970. "A Credit Scoring Model for Commercial Loans," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 2(4), pages 435-445, November.
    13. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    14. DeVaney, Sharon A. & Lytton, Ruth H., 1995. "Household insolvency: A review of household debt repayment, delinquency, and bankruptcy," Financial Services Review, Elsevier, vol. 4(2), pages 137-156.
    15. Kasper Roszbach, 2004. "Bank Lending Policy, Credit Scoring, and the Survival of Loans," The Review of Economics and Statistics, MIT Press, vol. 86(4), pages 946-958, November.
    16. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    17. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    18. Dinh, Thi Huyen Thanh & Kleimeier, Stefanie, 2007. "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, Elsevier, vol. 16(5), pages 471-495.
    19. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    20. David A. Belsley, 1988. "A Guide to Using the Collinearity Diagnostics," Boston College Working Papers in Economics 190, Boston College Department of Economics.
    21. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    22. Hoffmann, F. & Baesens, B. & Mues, C. & Van Gestel, T. & Vanthienen, J., 2007. "Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms," European Journal of Operational Research, Elsevier, vol. 177(1), pages 540-555, February.
    23. Jian Shi & Benlian Xu, 2016. "Credit Scoring by Fuzzy Support Vector Machines with a Novel Membership Function," JRFM, MDPI, vol. 9(4), pages 1-10, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    2. Xin Xu & Feng Xiong & Zhe An, 2023. "Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework," Journal of Business Ethics, Springer, vol. 186(1), pages 137-158, August.
    3. Байкулаков Шалкар // Baikulakov Shalkar & Белгибаев Зангар // Belgibayev Zanggar, 2021. "Анализ рисков потребительских кредитов с помощью алгоритмов машинного обучения // Consumer credit risk analysis via machine learning algorithms," Working Papers #2021-4, National Bank of Kazakhstan.
    4. Anil Kumar & Suneel Sharma & Mehregan Mahdavi, 2021. "Machine Learning (ML) Technologies for Digital Credit Scoring in Rural Finance: A Literature Review," Risks, MDPI, vol. 9(11), pages 1-15, October.
    5. Victor Flores & Brian Keith, 2019. "Gradient Boosted Trees Predictive Models for Surface Roughness in High-Speed Milling in the Steel and Aluminum Metalworking Industry," Complexity, Hindawi, vol. 2019, pages 1-15, July.
    6. Guoquan Zhang & Guohao Li & Jing Peng, 2020. "Risk Assessment and Monitoring of Green Logistics for Fresh Produce Based on a Support Vector Machine," Sustainability, MDPI, vol. 12(18), pages 1-20, September.
    7. Oguz Koc & Omur Ugur & A. Sevtap Kestel, 2023. "The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring," Papers 2303.05427, arXiv.org.
    8. Sunghyon Kyeong & Daehee Kim & Jinho Shin, 2021. "Can System Log Data Enhance the Performance of Credit Scoring?—Evidence from an Internet Bank in Korea," Sustainability, MDPI, vol. 14(1), pages 1-12, December.
    9. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    10. Pejman Peykani & Mostafa Sargolzaei & Mohammad Hashem Botshekan & Camelia Oprean-Stan & Amir Takaloo, 2023. "Optimization of Asset and Liability Management of Banks with Minimum Possible Changes," Mathematics, MDPI, vol. 11(12), pages 1-24, June.
    11. Ivan Tikshaev & Roman Kulshin & Gennadii Volokitin & Pavel Senchenko & Anatoly Sidorov, 2022. "The Possibilities of Using Scoring to Determine the Relevance of Software Development Tenders," Mathematics, MDPI, vol. 10(24), pages 1-13, December.
    12. Raad Khraishi & Ramin Okhrati, 2022. "Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit," Papers 2203.03003, arXiv.org.
    13. Dmytro Krukovets, 2020. "Data Science Opportunities at Central Banks: Overview," Visnyk of the National Bank of Ukraine, National Bank of Ukraine, issue 249, pages 13-24.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    2. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    3. David Pla-Santamaria & Mila Bravo & Javier Reig-Mullor & Francisco Salas-Molina, 2021. "A multicriteria approach to manage credit risk under strict uncertainty," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 494-523, July.
    4. Ha-Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," EconomiX Working Papers 2015-1, University of Paris Nanterre, EconomiX.
    5. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    6. Elisa Ughetto & Andrea Vezzulli, 2011. "What role can mutual guarantee consortia play for financing innovation? A firm-level study for Italy," International Journal of Banking, Accounting and Finance, Inderscience Enterprises Ltd, vol. 3(4), pages 294-319.
    7. Lobna Abid & Afif Masmoudi & Sonia Zouari-Ghorbel, 2018. "The Consumer Loan’s Payment Default Predictive Model: an Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 9(3), pages 948-962, September.
    8. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    9. Yiheng Li & Weidong Chen, 2020. "A Comparative Performance Assessment of Ensemble Learning for Credit Scoring," Mathematics, MDPI, vol. 8(10), pages 1-19, October.
    10. Lee, Tian-Shyug & Chiu, Chih-Chou & Chou, Yu-Chao & Lu, Chi-Jie, 2006. "Mining the customer credit using classification and regression tree and multivariate adaptive regression splines," Computational Statistics & Data Analysis, Elsevier, vol. 50(4), pages 1113-1130, February.
    11. Ha Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," Working Papers hal-04133309, HAL.
    12. Marshall, Andrew & Tang, Leilei & Milne, Alistair, 2010. "Variable reduction, sample selection bias and bank retail credit scoring," Journal of Empirical Finance, Elsevier, vol. 17(3), pages 501-512, June.
    13. Hazar Altinbas & Goktug Cenk Akkaya, 2017. "Improving the performance of statistical learning methods with a combined meta-heuristic for consumer credit risk assessment," Risk Management, Palgrave Macmillan, vol. 19(4), pages 255-280, November.
    14. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    15. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    16. Cao Son Tran & Dan Nicolau & Richi Nayak & Peter Verhoeven, 2021. "Modeling Credit Risk: A Category Theory Perspective," JRFM, MDPI, vol. 14(7), pages 1-21, July.
    17. Sigrist, Fabio & Leuenberger, Nicola, 2023. "Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities," European Journal of Operational Research, Elsevier, vol. 305(3), pages 1390-1406.
    18. Aggarwal, Nidhi & Singh, Manish K. & Thomas, Susan, 2023. "Do decreases in Distance-to-Default predict rating downgrades?," Economic Modelling, Elsevier, vol. 129(C).
    19. Kroot, Jan & Giouvris, Evangelos, 2016. "Dutch mortgages: Impact of the crisis on probability of default," Finance Research Letters, Elsevier, vol. 18(C), pages 205-217.
    20. Bonfim, Diana, 2009. "Credit risk drivers: Evaluating the contribution of firm level information and of macroeconomic dynamics," Journal of Banking & Finance, Elsevier, vol. 33(2), pages 281-299, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:11:y:2019:i:3:p:699-:d:201610. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.