IDEAS home Printed from https://ideas.repec.org/a/eee/teinso/v63y2020ics0160791x17302324.html
   My bibliography  Save this article

A study on credit scoring modeling with different feature selection and machine learning approaches

Author

Listed:
  • Trivedi, Shrawan Kumar

Abstract

A bit hurdle for financial institutions is to decide potential candidates to give a line of credit identifying the right people without any credit risk. For such a crucial decision, past demographic and financial data of debtors is important to build an automated artificial intelligence credit score prediction model based on machine learning classifier. In addition, for building robust and accurate machine learning models, important input predictors (debtor's information) must be selected. The present computational work focuses on building a credit scoring prediction model. A publicly available German credit data is incorporated in this study. An improvement in the credit scoring prediction has been shown with the use of different feature selection techniques (such as Information-gain, Gain-Ratio and Chi-Square) and machine learning classifiers (Bayesian, Naïve Bayes, Random Forest, Decision Tree (C5.0) and SVM (support Vector Machine)). Further, a comparative analysis is performed between different machine learning classifiers and between different feature selection techniques. Different evaluation metrics are considered for analyzing performance of the models (such as accuracy, F-measure, false positive rate, false negative rate and training time). After analysis, a best combination of machine learning classifier and feature selection technique are identified. In this study, a combination of random forest (RF) and Chi-Square (CS) is found good, among other combinations, with respect to good performance accuracy, F-measure and low false positive and false negative rates. However, training time for this particular combination was found to be slightly higher. Result of C5.0 with chi-square was comparable with the best one. This study provides an opportunity to financial institutions to build an automated model for better credit scoring.

Suggested Citation

  • Trivedi, Shrawan Kumar, 2020. "A study on credit scoring modeling with different feature selection and machine learning approaches," Technology in Society, Elsevier, vol. 63(C).
  • Handle: RePEc:eee:teinso:v:63:y:2020:i:c:s0160791x17302324
    DOI: 10.1016/j.techsoc.2020.101413
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0160791X17302324
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.techsoc.2020.101413?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Maldonado, Sebastián & Pérez, Juan & Bravo, Cristián, 2017. "Cost-based feature selection for Support Vector Machines: An application in credit scoring," European Journal of Operational Research, Elsevier, vol. 261(2), pages 656-665.
    2. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    3. Yao, Xiao & Crook, Jonathan & Andreeva, Galina, 2015. "Support vector regression for loss given default modelling," European Journal of Operational Research, Elsevier, vol. 240(2), pages 528-538.
    4. Coccia, Mario, 2020. "Deep learning technology for improving cancer care in society: New directions in cancer imaging driven by artificial intelligence," Technology in Society, Elsevier, vol. 60(C).
    5. Crook, Jonathan N. & Edelman, David B. & Thomas, Lyn C., 2007. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1447-1465, December.
    6. Wei Chen & Zhongfei Li & Jinchao Guo, 2020. "A VNS-EDA Algorithm-Based Feature Selection for Credit Risk Classification," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-14, April.
    7. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    8. Raffaella Calabrese, 2014. "Predicting bank loan recovery rates with a mixed continuous‐discrete model," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 30(2), pages 99-114, March.
    9. Naveed, Kashif & Watanabe, Chihiro & Neittaanmäki, Pekka, 2017. "Co-evolution between streaming and live music leads a way to the sustainable growth of music industry – Lessons from the US experiences," Technology in Society, Elsevier, vol. 50(C), pages 1-19.
    10. Al-Emran, Mostafa & Mezhuyev, Vitaliy & Kamaludin, Adzhar, 2020. "Towards a conceptual model for examining the impact of knowledge management factors on mobile learning acceptance," Technology in Society, Elsevier, vol. 61(C).
    11. Fox, Stephen, 2017. "Mass imagineering: Combining human imagination and automated engineering from early education to digital afterlife," Technology in Society, Elsevier, vol. 51(C), pages 163-171.
    12. Koutanaei, Fatemeh Nemati & Sajedi, Hedieh & Khanbabaei, Mohammad, 2015. "A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring," Journal of Retailing and Consumer Services, Elsevier, vol. 27(C), pages 11-23.
    13. Wongnaa, Camillus Abawiera & Babu, Suresh, 2020. "Building resilience to shocks of climate change in Ghana's cocoa production and its effect on productivity and incomes," Technology in Society, Elsevier, vol. 62(C).
    14. Cubric, Marija, 2020. "Drivers, barriers and social considerations for AI adoption in business and management: A tertiary study," Technology in Society, Elsevier, vol. 62(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Davidescu Adriana AnaMaria & Agafiței Marina-Diana & Strat Vasile Alecsandru & Dima Alina Mihaela, 2024. "Mapping the Landscape: A Bibliometric Analysis of Rating Agencies in the Era of Artificial Intelligence and Machine Learning," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 18(1), pages 67-85.
    2. Babaei, Golnoosh & Giudici, Paolo & Raffinetti, Emanuela, 2023. "Explainable FinTech lending," Journal of Economics and Business, Elsevier, vol. 125.
    3. Wei Li & Florentina Paraschiv & Georgios Sermpinis, 2022. "A data-driven explainable case-based reasoning approach for financial risk detection," Quantitative Finance, Taylor & Francis Journals, vol. 22(12), pages 2257-2274, December.
    4. Osama Wagdi & Yasmeen Tarek, 2022. "The Integration of Big Data and Artificial Neural Networks for Enhancing Credit Risk Scoring in Emerging Markets: Evidence from Egypt," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 14(2), pages 1-32, February.
    5. Polyzos, Efstathios & Fotiadis, Anestis & Huan, Tzung-Cheng, 2023. "From Heroes to Scoundrels: Exploring the effects of online campaigns celebrating frontline workers on COVID-19 outcomes," Technology in Society, Elsevier, vol. 72(C).
    6. Dong-Her Shih & Ting-Wei Wu & Po-Yuan Shih & Nai-An Lu & Ming-Hung Shih, 2022. "A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a P2P Lending Platform," Mathematics, MDPI, vol. 10(13), pages 1-13, June.
    7. Sun, Yue & Chai, Nana & Dong, Yizhe & Shi, Baofeng, 2022. "Assessing and predicting small industrial enterprises’ credit ratings: A fuzzy decision-making approach," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1158-1172.
    8. Weidong Guo & Zach Zhizhong Zhou, 2022. "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1248-1313, September.
    9. Ahmed Almustfa Hussin Adam Khatir & Marco Bee, 2022. "Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?," Risks, MDPI, vol. 10(9), pages 1-22, August.
    10. Daniel Ramos & Mahsa Khorram & Pedro Faria & Zita Vale, 2021. "Load Forecasting in an Office Building with Different Data Structure and Learning Parameters," Forecasting, MDPI, vol. 3(1), pages 1-14, March.
    11. Liu, Yiting & Baals, Lennart John & Osterrieder, Jörg & Hadji-Misheva, Branka, 2024. "Network centrality and credit risk: A comprehensive analysis of peer-to-peer lending dynamics," Finance Research Letters, Elsevier, vol. 63(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    2. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    3. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    4. Fang, Fang & Chen, Yuanyuan, 2019. "A new approach for credit scoring by directly maximizing the Kolmogorov–Smirnov statistic," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 180-194.
    5. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    6. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    7. Ostheimer, Julia & Chowdhury, Soumitra & Iqbal, Sarfraz, 2021. "An alliance of humans and machines for machine learning: Hybrid intelligent systems and their design principles," Technology in Society, Elsevier, vol. 66(C).
    8. Richard Chamboko & Jorge Miguel Bravo, 2020. "A Multi-State Approach to Modelling Intermediate Events and Multiple Mortgage Loan Outcomes," Risks, MDPI, vol. 8(2), pages 1-29, June.
    9. Kaposty, Florian & Kriebel, Johannes & Löderbusch, Matthias, 2020. "Predicting loss given default in leasing: A closer look at models and variable selection," International Journal of Forecasting, Elsevier, vol. 36(2), pages 248-266.
    10. Finlay, Steven, 2010. "Credit scoring for profitability objectives," European Journal of Operational Research, Elsevier, vol. 202(2), pages 528-537, April.
    11. L C Thomas, 2010. "Consumer finance: challenges for operational research," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(1), pages 41-52, January.
    12. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    13. Richard Chamboko & Jorge M. Bravo, 2016. "On the modelling of prognosis from delinquency to normal performance on retail consumer loans," Risk Management, Palgrave Macmillan, vol. 18(4), pages 264-287, December.
    14. Gao, Zheming & Fang, Shu-Cherng & Luo, Jian & Medhin, Negash, 2021. "A kernel-free double well potential support vector machine with applications," European Journal of Operational Research, Elsevier, vol. 290(1), pages 248-262.
    15. Teply, Petr & Polena, Michal, 2020. "Best classification algorithms in peer-to-peer lending," The North American Journal of Economics and Finance, Elsevier, vol. 51(C).
    16. Juan Laborda & Seyong Ryoo, 2021. "Feature Selection in a Credit Scoring Model," Mathematics, MDPI, vol. 9(7), pages 1-22, March.
    17. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    18. Yu Xia & Ta Xu & Ming-Xia Wei & Zhen-Ke Wei & Lian-Jie Tang, 2023. "Predicting Chain’s Manufacturing SME Credit Risk in Supply Chain Finance Based on Machine Learning Methods," Sustainability, MDPI, vol. 15(2), pages 1-18, January.
    19. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    20. Lobna Abid & Afif Masmoudi & Sonia Zouari-Ghorbel, 2018. "The Consumer Loan’s Payment Default Predictive Model: an Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 9(3), pages 948-962, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:teinso:v:63:y:2020:i:c:s0160791x17302324. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/technology-in-society .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.