IDEAS home Printed from https://ideas.repec.org/p/hal/wpaper/hal-02507499.html
   My bibliography  Save this paper

Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds

Author

Listed:
  • Elena Dumitrescu

    (EconomiX - EconomiX - UPN - Université Paris Nanterre - CNRS - Centre National de la Recherche Scientifique)

  • Sullivan Hué

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

  • Christophe Hurlin

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

  • Sessi Tokpavi

    (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)

Abstract

In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we pro- pose to obtain the best of both worlds by introducing a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with pairs of predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method. JEL Classification: G10 C25, C53

Suggested Citation

  • Elena Dumitrescu & Sullivan Hué & Christophe Hurlin & Sessi Tokpavi, 2021. "Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds," Working Papers hal-02507499, HAL.
  • Handle: RePEc:hal:wpaper:hal-02507499
    Note: View the original document on HAL open archive server: https://hal.science/hal-02507499v3
    as

    Download full text from publisher

    File URL: https://hal.science/hal-02507499v3/document
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Christophe Hurlin & Christophe Pérignon, 2019. "Machine learning et nouvelles sources de données pour le scoring de crédit," Revue d'économie financière, Association d'économie financière, vol. 0(3), pages 21-50.
    2. Viral V. Acharya & Lasse H. Pedersen & Thomas Philippon & Matthew Richardson, 2017. "Measuring Systemic Risk," The Review of Financial Studies, Society for Financial Studies, vol. 30(1), pages 2-47.
    3. Viaene, Stijn & Dedene, Guido, 2005. "Cost-sensitive learning and decision making revisited," European Journal of Operational Research, Elsevier, vol. 166(1), pages 212-220, October.
    4. Desai, Vijay S. & Crook, Jonathan N. & Overstreet, George A., 1996. "A comparison of neural networks and linear scoring models in the credit union environment," European Journal of Operational Research, Elsevier, vol. 95(1), pages 24-37, November.
    5. Robert Engle & Eric Jondeau & Michael Rockinger, 2015. "Systemic Risk in Europe," Review of Finance, European Finance Association, vol. 19(1), pages 145-190.
    6. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2018. "Econometrics and Machine Learning," Economie et Statistique / Economics and Statistics, Institut National de la Statistique et des Etudes Economiques (INSEE), issue 505-506, pages 147-169.
    7. Jon Frost & Leonardo Gambacorta & Yi Huang & Hyun Song Shin & Pablo Zbinden, 2019. "BigTech and the changing structure of financial intermediation," Economic Policy, CEPR, CESifo, Sciences Po;CES;MSH, vol. 34(100), pages 761-799.
    8. Niklas Bussmann & Paolo Giudici & Dimitri Marinelli & Jochen Papenbrock, 2021. "Explainable Machine Learning in Credit Risk Management," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 203-216, January.
    9. Mee Young Park & Trevor Hastie, 2007. "L1‐regularization path algorithm for generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(4), pages 659-677, September.
    10. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    11. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    12. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
    13. Srinivasan, Venkat & Kim, Yong H, 1987. "Credit Granting: A Comparative Analysis of Classification Procedures," Journal of Finance, American Finance Association, vol. 42(3), pages 665-681, July.
    14. Wanting Wang, 2012. "How the small and medium-sized enterprises’ owners’ credit features affect the enterprises’credit default behavior?," E3 Journal of Business Management and Economics., E3 Journals, vol. 3(2), pages 090-095.
    15. Christophe Hurlin & Christophe Pérignon, 2019. "Machine Learning et nouvelles sources de données pour le scoring de crédit," Working Papers halshs-02377886, HAL.
    16. M Stepanova & L C Thomas, 2001. "PHAB scores: proportional hazards analysis behavioural scores," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 52(9), pages 1007-1016, September.
    17. Hao Helen Zhang & Wenbin Lu, 2007. "Adaptive Lasso for Cox's proportional hazards model," Biometrika, Biometrika Trust, vol. 94(3), pages 691-703.
    18. Kar Yan Tam & Melody Y. Kiang, 1992. "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions," Management Science, INFORMS, vol. 38(7), pages 926-947, July.
    19. David Durand, 1941. "Risk Elements in Consumer Instalment Financing," NBER Books, National Bureau of Economic Research, Inc, number dura41-1.
    20. Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
    21. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    22. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    23. Arno de Caigny & Kristof Coussement & Koen W. de Bock, 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," Post-Print hal-01741661, HAL.
    24. Stein, Roger M., 2005. "The relationship between default prediction and lending profits: Integrating ROC analysis and loan pricing," Journal of Banking & Finance, Elsevier, vol. 29(5), pages 1213-1236, May.
    25. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    26. Berger, Allen N & Frame, W Scott & Miller, Nathan H, 2005. "Credit Scoring and the Availability, Price, and Risk of Small Business Credit," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 37(2), pages 191-222, April.
    27. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    28. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    29. Verbraken, Thomas & Bravo, Cristián & Weber, Richard & Baesens, Bart, 2014. "Development and application of consumer credit scoring models using profit-based classification measures," European Journal of Operational Research, Elsevier, vol. 238(2), pages 505-513.
    30. Steenackers, A. & Goovaerts, M. J., 1989. "A credit scoring model for personal loans," Insurance: Mathematics and Economics, Elsevier, vol. 8(1), pages 31-34, March.
    31. B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
    32. Bracke, Philippe & Datta, Anupam & Jung, Carsten & Sen, Shayak, 2019. "Machine learning explainability in finance: an application to default risk analysis," Bank of England working papers 816, Bank of England.
    33. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    34. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    35. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    36. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    37. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    38. Rochet, Jean-Charles, 1992. "Capital requirements and the behaviour of commercial banks," European Economic Review, Elsevier, vol. 36(5), pages 1137-1170, June.
    39. Hurlin, Christophe & Leymarie, Jérémy & Patin, Antoine, 2018. "Loss functions for Loss Given Default model comparison," European Journal of Operational Research, Elsevier, vol. 268(1), pages 348-360.
    40. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    41. D J Hand, 2005. "Good practice in retail credit scorecard assessment," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(9), pages 1109-1117, September.
    42. Paleologo, Giuseppe & Elisseeff, André & Antonini, Gianluca, 2010. "Subagging for credit scoring models," European Journal of Operational Research, Elsevier, vol. 201(2), pages 490-499, March.
    43. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    44. McIlhagga, William, 2016. "penalized: A MATLAB Toolbox for Fitting Generalized Linear Models with Penalties," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 72(i06).
    45. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.
    46. Blochlinger, Andreas & Leippold, Markus, 2006. "Economic benefit of powerful credit scoring," Journal of Banking & Finance, Elsevier, vol. 30(3), pages 851-873, March.
    47. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    48. David Durand, 1941. "Risk Elements in Consumer Instalment Financing, Technical Edition," NBER Books, National Bureau of Economic Research, Inc, number dura41-2.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Matthieu Garcin & Samuel Stéphan, 2023. "Credit scoring using neural networks and SURE posterior probability calibration," Working Papers hal-03286760, HAL.
    2. Bastien Lextrait, 2021. "Scaling up SME's credit scoring scope with LightGBM," EconomiX Working Papers 2021-25, University of Paris Nanterre, EconomiX.
    3. Giuseppe Cascarino & Mirko Moscatelli & Fabio Parlapiano, 2022. "Explainable Artificial Intelligence: interpreting default forecasting models based on Machine Learning," Questioni di Economia e Finanza (Occasional Papers) 674, Bank of Italy, Economic Research and International Relations Area.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    2. Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
    3. Emmanuel Flachaire & Gilles Hacheme & Sullivan Hu'e & S'ebastien Laurent, 2022. "GAM(L)A: An econometric model for interpretable Machine Learning," Papers 2203.11691, arXiv.org.
    4. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    5. Akkoç, Soner, 2012. "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish cred," European Journal of Operational Research, Elsevier, vol. 222(1), pages 168-178.
    6. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    7. James T. E. Chapman & Ajit Desai, 2023. "Macroeconomic Predictions Using Payments Data and Machine Learning," Forecasting, MDPI, vol. 5(4), pages 1-32, November.
    8. Byron Botha & Rulof Burger & Kevin Kotzé & Neil Rankin & Daan Steenkamp, 2023. "Big data forecasting of South African inflation," Empirical Economics, Springer, vol. 65(1), pages 149-188, July.
    9. Rais Ahmad Itoo & A. Selvarasu & José António Filipe, 2015. "Loan Products and Credit Scoring by Commercial Banks (India)," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 5(1), pages 851-851.
    10. Linhui Wang & Jianping Zhu & Chenlu Zheng & Zhiyuan Zhang, 2024. "Incorporating Digital Footprints into Credit-Scoring Models through Model Averaging," Mathematics, MDPI, vol. 12(18), pages 1-15, September.
    11. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    12. Koen W. de Bock, 2017. "The best of two worlds: Balancing model strength and comprehensibility in business failure prediction using spline-rule ensembles," Post-Print hal-01588059, HAL.
    13. Julien Chevallier & Dominique Guégan & Stéphane Goutte, 2021. "Is It Possible to Forecast the Price of Bitcoin?," Forecasting, MDPI, vol. 3(2), pages 1-44, May.
    14. Neuberg Richard & Hannah Lauren, 2017. "Loan pricing under estimation risk," Statistics & Risk Modeling, De Gruyter, vol. 34(1-2), pages 69-87, June.
    15. José Willer Prado & Valderí Castro Alcântara & Francisval Melo Carvalho & Kelly Carvalho Vieira & Luiz Kennedy Cruz Machado & Dany Flávio Tonelli, 2016. "Multivariate analysis of credit risk and bankruptcy research data: a bibliometric study involving different knowledge fields (1968–2014)," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1007-1029, March.
    16. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    17. Bartosz Uniejewski, 2024. "Regularization for electricity price forecasting," Papers 2404.03968, arXiv.org.
    18. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    19. repec:hum:wpaper:sfb649dp2013-037 is not listed on IDEAS
    20. Lily Davies & Mark Kattenberg & Benedikt Vogt, 2023. "Predicting Firm Exits with Machine Learning: Implications for Selection into COVID-19 Support and Productivity Growth," CPB Discussion Paper 444, CPB Netherlands Bureau for Economic Policy Analysis.
    21. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.

    More about this item

    Keywords

    Credit scoring; Machine Learning; Risk management; Interpretability; Econometrics; Machine learning; Econo- metrics;
    All these keywords.

    JEL classification:

    • G10 - Financial Economics - - General Financial Markets - - - General (includes Measurement and Data)
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:wpaper:hal-02507499. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.