IDEAS home Printed from https://ideas.repec.org/a/taf/oaefxx/v8y2020i1p1729569.html
   My bibliography  Save this article

Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting

Author

Listed:
  • Santosh Shrivastava
  • P Mary Jeyanthi
  • Sarbjit Singh
  • David McMillan

Abstract

Banks have a vital role in the financial system and its survival is crucial for the stability of the economy. This research paper attempts to create an efficient and appropriate predictive model using a machine learning approach for an early warning system of bank failure. This paper uses data collected for failed and survived public and private sector banks for the period of 2000–2017 located in India. Bank-specific variables as well as macroeconomic and market structure variables have been used to identify the stress level for banks. Since the number of failed banks in India is very less in comparison to surviving banks, the problem of imbalanced data arises and most of the machine learning algorithms do not work very well with such data. This paper uses a novel approach Synthetic Minority Oversampling Technique (SMOTE) to convert imbalanced data in a balanced form. Lasso regression is used to reduce the redundant features from the failure predictive model. To avoid the bias and over-fitting in the models, random forest and AdaBoost techniques are applied and compared with the logistic regression to get the best predictive model. The result of the study holds its application to various stakeholders like shareholders, lenders and borrowers etc. to measure the financial stress of banks. This study offers an analytical approach ranging from the selection of the most significant bank failure specific indicators using lasso regression, converting data from imbalanced to balanced form using SMOTE and the choice of the appropriate machine learning techniques to predict the failure of the bank.

Suggested Citation

  • Santosh Shrivastava & P Mary Jeyanthi & Sarbjit Singh & David McMillan, 2020. "Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting," Cogent Economics & Finance, Taylor & Francis Journals, vol. 8(1), pages 1729569-172, January.
  • Handle: RePEc:taf:oaefxx:v:8:y:2020:i:1:p:1729569
    DOI: 10.1080/23322039.2020.1729569
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/23322039.2020.1729569
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/23322039.2020.1729569?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mousa, Gehan A. & Elamir, Elsayed A.H. & Hussainey, Khaled, 2022. "The effect of annual report narratives on the cost of capital in the Middle East and North Africa: A machine learning approach," Research in International Business and Finance, Elsevier, vol. 62(C).
    2. Jorge E. Galán, 2021. "CREWS: a CAMELS-based early warning system of systemic risk in the banking sector," Occasional Papers 2132, Banco de España.
    3. Sarbjit Singh Oberoi & Sayan Banerjee, 2023. "Bankruptcy Prediction of Indian Banks Using Advanced Analytics," Economic Studies journal, Bulgarian Academy of Sciences - Economic Research Institute, issue 4, pages 22-41.
    4. Citterio, Alberto, 2024. "Bank failure prediction models: Review and outlook," Socio-Economic Planning Sciences, Elsevier, vol. 92(C).
    5. Turki, Aymen & Nahidi, Narmin, 2024. "Do European fintech benefit from bank-affiliated VCs?," International Review of Economics & Finance, Elsevier, vol. 93(PB), pages 167-188.
    6. Li Xian Liu & Shuangzhe Liu & Milind Sathye, 2021. "Predicting Bank Failures: A Synthesis of Literature and Directions for Future Research," JRFM, MDPI, vol. 14(10), pages 1-24, October.
    7. Zhiyong Li & Chen Feng & Ying Tang, 2022. "Bank efficiency and failure prediction: a nonparametric and dynamic model based on data envelopment analysis," Annals of Operations Research, Springer, vol. 315(1), pages 279-315, August.
    8. Kristóf, Tamás & Virág, Miklós, 2022. "EU-27 bank failure prediction with C5.0 decision trees and deep learning neural networks," Research in International Business and Finance, Elsevier, vol. 61(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:oaefxx:v:8:y:2020:i:1:p:1729569. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/OAEF20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.