IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i11p169-d1275568.html
   My bibliography  Save this article

Machine Learning for Credit Risk Prediction: A Systematic Literature Review

Author

Listed:
  • Jomark Pablo Noriega

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Financiera QAPAQ, Lima 150120, Peru
    These authors contributed equally to this work.)

  • Luis Antonio Rivera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Centro de Ciências Exatas e Tecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes 28013-602, Brazil
    These authors contributed equally to this work.)

  • José Alfredo Herrera

    (Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
    Programme in Biotechnology, Engineering and Chemical Technology, Universidad Pablo de Olavide, 41013 Sevilla, Spain
    These authors contributed equally to this work.)

Abstract

In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.

Suggested Citation

  • Jomark Pablo Noriega & Luis Antonio Rivera & José Alfredo Herrera, 2023. "Machine Learning for Credit Risk Prediction: A Systematic Literature Review," Data, MDPI, vol. 8(11), pages 1-17, November.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/11/169/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/11/169/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    2. Gianfranco Lombardo & Mattia Pellegrino & George Adosoglou & Stefano Cagnoni & Panos M. Pardalos & Agostino Poggi, 2022. "Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks," Future Internet, MDPI, vol. 14(8), pages 1-23, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Dangxing Chen & Luyao Zhang, 2023. "Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance," Papers 2301.07060, arXiv.org.
    3. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    4. Al-Amin Abba Dabo & Amin Hosseinian-Far, 2023. "An Integrated Methodology for Enhancing Reverse Logistics Flows and Networks in Industry 5.0," Logistics, MDPI, vol. 7(4), pages 1-26, December.
    5. Simone Narizzano & Marco Orlandi & Antonio Scalia, 2024. "The Bank of Italy’s statistical model for the credit assessment of non-financial firms," Temi di discussione (Economic working papers) 53, Bank of Italy, Economic Research and International Relations Area.
    6. Miao Zhu & Ben-Chang Shia & Meng Su & Jialin Liu, 2024. "Consumer Default Risk Portrait: An Intelligent Management Framework of Online Consumer Credit Default Risk," Mathematics, MDPI, vol. 12(10), pages 1-19, May.
    7. Yusheng Li & Mengyi Sha, 2024. "Two‐stage credit risk prediction framework based on three‐way decisions with automatic threshold learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(5), pages 1263-1277, August.
    8. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    9. Nadia Ayed & Khemaies Bougatef, 2024. "Performance Assessment of Logistic Regression (LR), Artificial Neural Network (ANN), Fuzzy Inference System (FIS) and Adaptive Neuro-Fuzzy System (ANFIS) in Predicting Default Probability: The Case of," Computational Economics, Springer;Society for Computational Economics, vol. 64(3), pages 1803-1835, September.
    10. Ana Lorena Jiménez-Preciado & Francisco Venegas-Martínez & Abraham Ramírez-García, 2022. "Stock Portfolio Optimization with Competitive Advantages (MOAT): A Machine Learning Approach," Mathematics, MDPI, vol. 10(23), pages 1-16, November.
    11. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1-15, March.
    12. Yang Liu & Fei Huang & Lili Ma & Qingguo Zeng & Jiale Shi, 2024. "Credit scoring prediction leveraging interpretable ensemble learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(2), pages 286-308, March.
    13. Shi, Yong & Qu, Yi & Chen, Zhensong & Mi, Yunlong & Wang, Yunong, 2024. "Improved credit risk prediction based on an integrated graph representation learning approach with graph transformation," European Journal of Operational Research, Elsevier, vol. 315(2), pages 786-801.
    14. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    15. Li, Zhiyong & Li, Aimin & Bellotti, Anthony & Yao, Xiao, 2023. "The profitability of online loans: A competing risks analysis on default and prepayment," European Journal of Operational Research, Elsevier, vol. 306(2), pages 968-985.
    16. Piccialli, Veronica & Romero Morales, Dolores & Salvatore, Cecilia, 2024. "Supervised feature compression based on counterfactual analysis," European Journal of Operational Research, Elsevier, vol. 317(2), pages 273-285.
    17. Dangxing Chen, 2022. "Two-stage Modeling for Prediction with Confidence," Papers 2209.08848, arXiv.org.
    18. Kellner, Ralf & Nagl, Maximilian & Rösch, Daniel, 2022. "Opening the black box – Quantile neural networks for loss given default prediction," Journal of Banking & Finance, Elsevier, vol. 134(C).
    19. Dangxing Chen & Weicheng Ye, 2022. "Generalized Groves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance," Papers 2209.10082, arXiv.org, revised Jul 2024.
    20. Yang, Fan & Abedin, Mohammad Zoynul & Hajek, Petr, 2024. "An explainable federated learning and blockchain-based secure credit modeling method," European Journal of Operational Research, Elsevier, vol. 317(2), pages 449-467.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.