IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i6p855-d1357214.html
   My bibliography  Save this article

Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards

Author

Listed:
  • John Martin

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

  • Sona Taheri

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

  • Mali Abdollahian

    (School of Science, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia)

Abstract

Credit risk scorecard models are utilized by lending institutions to optimize decisions on credit approvals. In recent years, ensemble learning has often been deployed to reduce misclassification costs in credit risk scorecards. In this paper, we compared the risk estimation of 26 widely used machine learning algorithms based on commonly used statistical metrics. The best-performing algorithms were then used for model selection in ensemble learning. For the first time, we proposed financial criteria that assess the impact of losses associated with both false positive and false negative predictions to identify optimal ensemble learning. The German Credit Dataset (GCD) is augmented with simulated financial information according to a hypothetical mortgage portfolio observed in UK, European and Australian banks to enable the assessment of losses arising from misclassification costs. The experimental results using the simulated GCD show that the best predictive individual algorithm with the accuracy of 0.87, Gini of 0.88 and Area Under the Receiver Operating Curve of 0.94 was the Generalized Additive Model (GAM). The ensemble learning method with the lowest misclassification cost was the combination of Random Forest (RF) and K-Nearest Neighbors (KNN), totaling USD 417 million in costs (USD 230 for default costs and USD 187 for opportunity costs) compared to the costs of the GAM (USD 487, USD 287 and USD 200). Implementing the proposed financial criteria has led to a significant USD 70 million reduction in misclassification costs derived from a small sample. Thus, the lending institutions’ profit would considerably rise as the number of submitted credit applications for approval increases.

Suggested Citation

  • John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:6:p:855-:d:1357214
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/6/855/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/6/855/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Panayiota Koulafetis, 2017. "Modern Credit Risk Management," Palgrave Macmillan Books, Palgrave Macmillan, number 978-1-137-52407-2, September.
    2. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    3. Boxiang Wang & Hui Zou, 2018. "Another look at distance‐weighted discrimination," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 177-198, January.
    4. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    3. Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.
    4. Kleiman, Rachel M. & Characklis, Gregory W. & Kern, Jordan D., 2022. "Managing weather- and market price-related financial risks in algal biofuel production," Renewable Energy, Elsevier, vol. 200(C), pages 111-124.
    5. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    6. Hayley Randall & Andreas Artemiou & Xingye Qiao, 2021. "Sufficient dimension reduction based on distance‐weighted discrimination," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(4), pages 1186-1211, December.
    7. Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
    8. John De Jesús González & Filiberto Enrique Valdés Medina & Maria Luisa Saavedra García, 2021. "Factores de éxito en el financiamiento para Pymes a través del Crowdfunding en México," Remef - Revista Mexicana de Economía y Finanzas Nueva Época REMEF (The Mexican Journal of Economics and Finance), Instituto Mexicano de Ejecutivos de Finanzas, IMEF, vol. 16(2), pages 1-23, Abril - J.
    9. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    10. Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
    11. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    12. Kellner, Ralf & Nagl, Maximilian & Rösch, Daniel, 2022. "Opening the black box – Quantile neural networks for loss given default prediction," Journal of Banking & Finance, Elsevier, vol. 134(C).
    13. Dangxing Chen & Weicheng Ye, 2022. "Generalized Gloves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance," Papers 2209.10082, arXiv.org.
    14. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    15. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
    16. Chi, Guotai & Dong, Bingjie & Zhou, Ying & Jin, Peng, 2024. "Long-horizon predictions of credit default with inconsistent customers," Technological Forecasting and Social Change, Elsevier, vol. 198(C).
    17. Chen, Dangxing & Ye, Jiahui & Ye, Weicheng, 2023. "Interpretable selective learning in credit risk," Research in International Business and Finance, Elsevier, vol. 65(C).
    18. Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
    19. Jomark Pablo Noriega & Luis Antonio Rivera & José Alfredo Herrera, 2023. "Machine Learning for Credit Risk Prediction: A Systematic Literature Review," Data, MDPI, vol. 8(11), pages 1-17, November.
    20. Li, Aimin & Li, Zhiyong & Bellotti, Anthony, 2023. "Predicting loss given default of unsecured consumer loans with time-varying survival scores," Pacific-Basin Finance Journal, Elsevier, vol. 78(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:6:p:855-:d:1357214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.