IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2403.03785.html
   My bibliography  Save this paper

A machine learning workflow to address credit default prediction

Author

Listed:
  • Rambod Rahmani
  • Marco Parola
  • Mario G. C. A. Cimino

Abstract

Due to the recent increase in interest in Financial Technology (FinTech), applications like credit default prediction (CDP) are gaining significant industrial and academic attention. In this regard, CDP plays a crucial role in assessing the creditworthiness of individuals and businesses, enabling lenders to make informed decisions regarding loan approvals and risk management. In this paper, we propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations. The workflow consists of multiple steps, each designed to leverage the strengths of different techniques featured in machine learning pipelines and, thus best solve the CDP task. We employ a comprehensive and systematic approach starting with data preprocessing using Weight of Evidence encoding, a technique that ensures in a single-shot data scaling by removing outliers, handling missing values, and making data uniform for models working with different data types. Next, we train several families of learning models, introducing ensemble techniques to build more robust models and hyperparameter optimization via multi-objective genetic algorithms to consider both predictive accuracy and financial aspects. Our research aims at contributing to the FinTech industry in providing a tool to move toward more accurate and reliable credit risk assessment, benefiting both lenders and borrowers.

Suggested Citation

  • Rambod Rahmani & Marco Parola & Mario G. C. A. Cimino, 2024. "A machine learning workflow to address credit default prediction," Papers 2403.03785, arXiv.org.
  • Handle: RePEc:arx:papers:2403.03785
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2403.03785
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Verbraken, Thomas & Bravo, Cristián & Weber, Richard & Baesens, Bart, 2014. "Development and application of consumer credit scoring models using profit-based classification measures," European Journal of Operational Research, Elsevier, vol. 238(2), pages 505-513.
    2. Hung Xuan Do & Daniel Rösch & Harald Scheule, 2020. "Liquidity Constraints, Home Equity and Residential Mortgage Losses," The Journal of Real Estate Finance and Economics, Springer, vol. 61(2), pages 208-246, August.
    3. Fahmida E. Moula & Chi Guotai & Mohammad Zoynul Abedin, 2017. "Credit default prediction modeling: an application of support vector machine," Risk Management, Palgrave Macmillan, vol. 19(2), pages 158-187, May.
    4. Lago, Jesus & De Ridder, Fjo & De Schutter, Bart, 2018. "Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms," Applied Energy, Elsevier, vol. 221(C), pages 386-405.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shao, Zhen & Zheng, Qingru & Yang, Shanlin & Gao, Fei & Cheng, Manli & Zhang, Qiang & Liu, Chen, 2020. "Modeling and forecasting the electricity clearing price: A novel BELM based pattern classification framework and a comparative analytic study on multi-layer BELM and LSTM," Energy Economics, Elsevier, vol. 86(C).
    2. Abedin, Mohammad Zoynul & Hajek, Petr & Sharif, Taimur & Satu, Md. Shahriare & Khan, Md. Imran, 2023. "Modelling bank customer behaviour using feature engineering and classification techniques," Research in International Business and Finance, Elsevier, vol. 65(C).
    3. Li, Yibei & Wang, Ximei & Djehiche, Boualem & Hu, Xiaoming, 2020. "Credit scoring by incorporating dynamic networked information," European Journal of Operational Research, Elsevier, vol. 286(3), pages 1103-1112.
    4. Erik Heilmann & Janosch Henze & Heike Wetzel, 2021. "Machine learning in energy forecasts with an application to high frequency electricity consumption data," MAGKS Papers on Economics 202135, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    5. Wang, Delu & Gan, Jun & Mao, Jinqi & Chen, Fan & Yu, Lan, 2023. "Forecasting power demand in China with a CNN-LSTM model including multimodal information," Energy, Elsevier, vol. 263(PE).
    6. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    7. Umut Ugurlu & Ilkay Oksuz & Oktay Tas, 2018. "Electricity Price Forecasting Using Recurrent Neural Networks," Energies, MDPI, vol. 11(5), pages 1-23, May.
    8. Doumpos, Michalis & Zopounidis, Constantin & Gounopoulos, Dimitrios & Platanakis, Emmanouil & Zhang, Wenke, 2023. "Operational research and artificial intelligence methods in banking," European Journal of Operational Research, Elsevier, vol. 306(1), pages 1-16.
    9. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    10. Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.
    11. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    12. Tomasz Serafin & Bartosz Uniejewski & Rafał Weron, 2019. "Averaging Predictive Distributions Across Calibration Windows for Day-Ahead Electricity Price Forecasting," Energies, MDPI, vol. 12(13), pages 1-12, July.
    13. Namrye Son, 2021. "Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting," Sustainability, MDPI, vol. 13(22), pages 1-25, November.
    14. Galarneau-Vincent, Rémi & Gauthier, Geneviève & Godin, Frédéric, 2023. "Foreseeing the worst: Forecasting electricity DART spikes," Energy Economics, Elsevier, vol. 119(C).
    15. Tsukahara, Fábio Yasuhiro & Kimura, Herbert & Sobreiro, Vinicius Amorim & Zambrano, Juan Carlos Arismendi, 2016. "Validation of default probability models: A stress testing approach," International Review of Financial Analysis, Elsevier, vol. 47(C), pages 70-85.
    16. Maldonado, Sebastián & Pérez, Juan & Bravo, Cristián, 2017. "Cost-based feature selection for Support Vector Machines: An application in credit scoring," European Journal of Operational Research, Elsevier, vol. 261(2), pages 656-665.
    17. Emil Kraft & Dogan Keles & Wolf Fichtner, 2020. "Modeling of frequency containment reserve prices with econometrics and artificial intelligence," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(8), pages 1179-1197, December.
    18. Díaz, Guzmán & Coto, José & Gómez-Aleixandre, Javier, 2019. "Prediction and explanation of the formation of the Spanish day-ahead electricity price through machine learning regression," Applied Energy, Elsevier, vol. 239(C), pages 610-625.
    19. Ahl, A. & Yarime, M. & Goto, M. & Chopra, Shauhrat S. & Kumar, Nallapaneni Manoj. & Tanaka, K. & Sagawa, D., 2020. "Exploring blockchain for the energy transition: Opportunities and challenges based on a case study in Japan," Renewable and Sustainable Energy Reviews, Elsevier, vol. 117(C).
    20. Wuyue An & Lin Wang & Dongfeng Zhang, 2023. "Comprehensive commodity price forecasting framework using text mining methods," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(7), pages 1865-1888, November.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2403.03785. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.