IDEAS home Printed from https://ideas.repec.org/a/eee/intfor/v40y2024i1p348-372.html
   My bibliography  Save this article

A novel deep ensemble model for imbalanced credit scoring in internet finance

Author

Listed:
  • Xiao, Jin
  • Zhong, Yu
  • Jia, Yanlin
  • Wang, Yadong
  • Li, Ruoyi
  • Jiang, Xiaoyi
  • Wang, Shouyang

Abstract

Most existing deep ensemble credit scoring models have considered deep neural networks, for which the structures are difficult to design and the modeling results are difficult to interpret. Moreover, the methods of dealing with the class-imbalance problem in these studies are still based on traditional resampling methods. To fill these gaps, we combine a new over-sampling method, the variational autoencoder (VAE), and a deep ensemble classifier, the deep forest (DF), and propose a novel deep ensemble model for credit scoring in internet finance, VAE–DF. We train and test our model using a number of credit scoring datasets in internet finance and find that our model exhibits good performance and can realize a self-adapting depth. The results show that VAE–DF is an effective credit scoring tool, especially for highly class-imbalanced and non-linear datasets in internet finance, due to its strong ability to learn the complex distributions of these datasets.

Suggested Citation

  • Xiao, Jin & Zhong, Yu & Jia, Yanlin & Wang, Yadong & Li, Ruoyi & Jiang, Xiaoyi & Wang, Shouyang, 2024. "A novel deep ensemble model for imbalanced credit scoring in internet finance," International Journal of Forecasting, Elsevier, vol. 40(1), pages 348-372.
  • Handle: RePEc:eee:intfor:v:40:y:2024:i:1:p:348-372
    DOI: 10.1016/j.ijforecast.2023.03.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0169207023000353
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijforecast.2023.03.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Crone, Sven F. & Finlay, Steven, 2012. "Instance sampling in credit scoring: An empirical study of sample size and balancing," International Journal of Forecasting, Elsevier, vol. 28(1), pages 224-238.
    2. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    3. Freedman, Seth & Jin, Ginger Zhe, 2017. "The information value of online social networks: Lessons from peer-to-peer lending," International Journal of Industrial Organization, Elsevier, vol. 51(C), pages 185-222.
    4. Tsai, Chih-Fong & Sue, Kuen-Liang & Hu, Ya-Han & Chiu, Andy, 2021. "Combining feature selection, instance selection, and ensemble classification techniques for improved financial distress prediction," Journal of Business Research, Elsevier, vol. 130(C), pages 200-209.
    5. Thomas, Lyn C., 2000. "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," International Journal of Forecasting, Elsevier, vol. 16(2), pages 149-172.
    6. Lean Yu & Zebin Yang & Ling Tang, 2016. "A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment," Flexible Services and Manufacturing Journal, Springer, vol. 28(4), pages 576-592, December.
    7. O. L. Mangasarian, 1965. "Linear and Nonlinear Separation of Patterns by Linear Programming," Operations Research, INFORMS, vol. 13(3), pages 444-452, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xia, Yufei & Zhao, Junhao & He, Lingyun & Li, Yinguo & Yang, Xiaoli, 2021. "Forecasting loss given default for peer-to-peer loans via heterogeneous stacking ensemble approach," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1590-1613.
    2. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    3. Li, Yibei & Wang, Ximei & Djehiche, Boualem & Hu, Xiaoming, 2020. "Credit scoring by incorporating dynamic networked information," European Journal of Operational Research, Elsevier, vol. 286(3), pages 1103-1112.
    4. Rasa Kanapickiene & Renatas Spicas, 2019. "Credit Risk Assessment Model for Small and Micro-Enterprises: The Case of Lithuania," Risks, MDPI, vol. 7(2), pages 1-23, June.
    5. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    6. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    7. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    8. Martin Leo & Suneel Sharma & K. Maddulety, 2019. "Machine Learning in Banking Risk Management: A Literature Review," Risks, MDPI, vol. 7(1), pages 1-22, March.
    9. Medina-Olivares, Victor & Calabrese, Raffaella & Dong, Yizhe & Shi, Baofeng, 2022. "Spatial dependence in microfinance credit default," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1071-1085.
    10. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    11. Jianhua Jiang & Xianqiu Meng & Yang Liu & Huan Wang, 2022. "An Enhanced TSA-MLP Model for Identifying Credit Default Problems," SAGE Open, , vol. 12(2), pages 21582440221, April.
    12. Eccles, Peter & Grout, Paul & Siciliani, Paolo & Zalewska, Anna, 2021. "The impact of machine learning and big data on credit markets," Bank of England working papers 930, Bank of England.
    13. Huei-Wen Teng & Michael Lee, 2019. "Estimation Procedures of Using Five Alternative Machine Learning Methods for Predicting Credit Card Default," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-27, September.
    14. Liu, Wanan & Fan, Hong & Xia, Meng, 2023. "Tree-based heterogeneous cascade ensemble model for credit scoring," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1593-1614.
    15. Zhou, Ying & Shen, Long & Ballester, Laura, 2023. "A two-stage credit scoring model based on random forest: Evidence from Chinese small firms," International Review of Financial Analysis, Elsevier, vol. 89(C).
    16. Dimitrios Nikolaidis & Michalis Doumpos, 2022. "Credit Scoring with Drift Adaptation Using Local Regions of Competence," SN Operations Research Forum, Springer, vol. 3(4), pages 1-28, December.
    17. Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
    18. Dagmar Camska & Jiri Klecka, 2020. "Comparison of Prediction Models Applied in Economic Recession and Expansion," JRFM, MDPI, vol. 13(3), pages 1-16, March.
    19. Tomáš Vaněk & David Hampel, 2017. "The Probability of Default Under IFRS 9: Multi-period Estimation and Macroeconomic Forecast," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 65(2), pages 759-776.
    20. Gero Szepannek, 2022. "An Overview on the Landscape of R Packages for Open Source Scorecard Modelling," Risks, MDPI, vol. 10(3), pages 1-33, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:40:y:2024:i:1:p:348-372. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.