IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-03331114.html
   My bibliography  Save this paper

Machine Learning for Credit Scoring: Improving Logistic Regression with Non Linear Decision Tree Effects

Author

Listed:
  • Elena Ivona Dumitrescu

    (EconomiX - EconomiX - UPN - Université Paris Nanterre - CNRS - Centre National de la Recherche Scientifique)

  • Sullivan Hué

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

  • Christophe Hurlin

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

  • Sessi Tokpavi

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

Abstract

In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we propose a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with original predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method

Suggested Citation

  • Elena Ivona Dumitrescu & Sullivan Hué & Christophe Hurlin & Sessi Tokpavi, 2022. "Machine Learning for Credit Scoring: Improving Logistic Regression with Non Linear Decision Tree Effects," Post-Print hal-03331114, HAL.
  • Handle: RePEc:hal:journl:hal-03331114
    DOI: 10.1016/j.ejor.2021.06.053
    Note: View the original document on HAL open archive server: https://hal.science/hal-03331114
    as

    Download full text from publisher

    File URL: https://hal.science/hal-03331114/document
    Download Restriction: no

    File URL: https://libkey.io/10.1016/j.ejor.2021.06.053?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Christophe Hurlin & Christophe Pérignon, 2019. "Machine learning et nouvelles sources de données pour le scoring de crédit," Revue d'économie financière, Association d'économie financière, vol. 0(3), pages 21-50.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Verbraken, Thomas & Bravo, Cristián & Weber, Richard & Baesens, Bart, 2014. "Development and application of consumer credit scoring models using profit-based classification measures," European Journal of Operational Research, Elsevier, vol. 238(2), pages 505-513.
    4. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    5. Desai, Vijay S. & Crook, Jonathan N. & Overstreet, George A., 1996. "A comparison of neural networks and linear scoring models in the credit union environment," European Journal of Operational Research, Elsevier, vol. 95(1), pages 24-37, November.
    6. Bertrand Candelon & Elena-Ivona Dumitrescu & Christophe Hurlin, 2012. "How to Evaluate an Early-Warning System: Toward a Unified Statistical Framework for Assessing Financial Crises Forecasting Methods," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 60(1), pages 75-113, April.
    7. Bracke, Philippe & Datta, Anupam & Jung, Carsten & Sen, Shayak, 2019. "Machine learning explainability in finance: an application to default risk analysis," Bank of England working papers 816, Bank of England.
    8. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    9. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    10. Jon Frost & Leonardo Gambacorta & Yi Huang & Hyun Song Shin & Pablo Zbinden, 2019. "BigTech and the changing structure of financial intermediation," Economic Policy, CEPR, CESifo, Sciences Po;CES;MSH, vol. 34(100), pages 761-799.
    11. Peter R. Hansen & Asger Lunde & James M. Nason, 2011. "The Model Confidence Set," Econometrica, Econometric Society, vol. 79(2), pages 453-497, March.
    12. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    13. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    14. Mee Young Park & Trevor Hastie, 2007. "L1‐regularization path algorithm for generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(4), pages 659-677, September.
    15. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    16. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    17. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
    18. Hurlin, Christophe & Leymarie, Jérémy & Patin, Antoine, 2018. "Loss functions for Loss Given Default model comparison," European Journal of Operational Research, Elsevier, vol. 268(1), pages 348-360.
    19. Srinivasan, Venkat & Kim, Yong H, 1987. "Credit Granting: A Comparative Analysis of Classification Procedures," Journal of Finance, American Finance Association, vol. 42(3), pages 665-681, July.
    20. Wanting Wang, 2012. "How the small and medium-sized enterprises’ owners’ credit features affect the enterprises’credit default behavior?," E3 Journal of Business Management and Economics., E3 Journals, vol. 3(2), pages 090-095.
    21. D J Hand, 2005. "Good practice in retail credit scorecard assessment," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1109-1117, September.
    22. Paleologo, Giuseppe & Elisseeff, André & Antonini, Gianluca, 2010. "Subagging for credit scoring models," European Journal of Operational Research, Elsevier, vol. 201(2), pages 490-499, March.
    23. Bertrand Candelon & Elena-Ivona Dumitrescu & Christophe Hurlin, 2012. "How to evaluate an Early Warning System ?," Working Papers halshs-00450050, HAL.
    24. Kar Yan Tam & Melody Y. Kiang, 1992. "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions," Management Science, INFORMS, vol. 38(7), pages 926-947, July.
    25. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    26. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    27. Arno de Caigny & Kristof Coussement & Koen W. de Bock, 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," Post-Print hal-01741661, HAL.
    28. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    29. McIlhagga, William, 2016. "penalized: A MATLAB Toolbox for Fitting Generalized Linear Models with Penalties," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 72(i06).
    30. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.
    31. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    2. Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
    3. Emmanuel Flachaire & Gilles Hacheme & Sullivan Hu'e & S'ebastien Laurent, 2022. "GAM(L)A: An econometric model for interpretable Machine Learning," Papers 2203.11691, arXiv.org.
    4. Koen W. de Bock, 2017. "The best of two worlds: Balancing model strength and comprehensibility in business failure prediction using spline-rule ensembles," Post-Print hal-01588059, HAL.
    5. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    6. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    7. Chuliá, Helena & Garrón, Ignacio & Uribe, Jorge M., 2024. "Daily growth at risk: Financial or real drivers? The answer is not always the same," International Journal of Forecasting, Elsevier, vol. 40(2), pages 762-776.
    8. Bartosz Uniejewski, 2024. "Regularization for electricity price forecasting," Papers 2404.03968, arXiv.org.
    9. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    10. repec:hum:wpaper:sfb649dp2013-037 is not listed on IDEAS
    11. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    12. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    13. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    14. Qin, Yichen & Wang, Linna & Li, Yang & Li, Rong, 2023. "Visualization and assessment of model selection uncertainty," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
    15. Härdle, Wolfgang Karl & Prastyo, Dedy Dwi, 2013. "Default risk calculation based on predictor selection for the Southeast Asian industry," SFB 649 Discussion Papers 2013-037, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    16. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
    17. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    18. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    19. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    20. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    21. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.

    More about this item

    Keywords

    Risk management; Credit scoring; Machine learning; Interpretability; Econometrics;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-03331114. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.