IDEAS home Printed from https://ideas.repec.org/h/bis/bisifc/59-34.html
   My bibliography  Save this book chapter

Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets

In: Data science in central banking: applications and tools

Author

Listed:
  • Davide Nicola Continanza
  • Andrea del Monaco
  • Marco di Lucido
  • Daniele Figoli
  • Pasquale Maddaloni
  • Filippo Quarta
  • Giuseppe Turturiello

Abstract

This paper addresses the issue of assessing the quality of granular datasets reported by banks via machine learning models. In particular, it investigates how supervised and unsupervised learning algorithms can exploit patterns that can be recognized in other data sources dealing with similar phenomena (although these phenomena are available at a different level of aggregation), in order to detect potential outliers to be submitted to banks for their own checks. The above machine learning algorithms are finally stacked in a semi-supervised fashion in order to enhance their individual outlier detection ability. The described methodology is applied to compare the granular AnaCredit dataset, firstly with the Balance Sheet Items statistics (BSI), and secondly with the harmonised supervisory statistics of the Financial Reporting (FinRep), which are compiled for the Eurosystem and the Single Supervisory Mechanism, respectively. In both cases, we show that the performance of the stacking technique, in terms of F1-score, is higher than in each algorithm alone.
(This abstract was borrowed from another version of this item.)

Suggested Citation

  • Davide Nicola Continanza & Andrea del Monaco & Marco di Lucido & Daniele Figoli & Pasquale Maddaloni & Filippo Quarta & Giuseppe Turturiello, 2023. "Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data science in central banking: applications and tools, volume 59, Bank for International Settlements.
  • Handle: RePEc:bis:bisifc:59-34
    as

    Download full text from publisher

    File URL: https://www.bis.org/ifc/publ/ifcb59_34.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Farnè, Matteo & Vouldis, Angelos T., 2018. "A methodology for automised outlier detection in high-dimensional datasets: an application to euro area banks' supervisory data," Working Paper Series 2171, European Central Bank.
    2. Markus Goldstein & Seiichi Uchida, 2016. "A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-31, April.
    3. Tobias Cagala, 2017. "Improving data quality and closing data gaps with machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data needs and Statistics compilation for macroprudential analysis, volume 46, Bank for International Settlements.
    4. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    5. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    6. Granger, C. W. J., 1981. "Some properties of time series data and their use in econometric model specification," Journal of Econometrics, Elsevier, vol. 16(1), pages 121-130, May.
    7. Fabio Zambuto, 2021. "Quality checks on granular banking data: an experimental approach based on machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Micro data for the macro world, volume 53, Bank for International Settlements.
    8. Koller, Manuel & Stahel, Werner A., 2011. "Sharpening Wald-type inference in robust regression for small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(8), pages 2504-2515, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2021. "Learning from revisions: a tool for detecting potential errors in banks' balance sheet statistical reporting," Questioni di Economia e Finanza (Occasional Papers) 611, Bank of Italy, Economic Research and International Relations Area.
    2. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2022. "Learning from revisions: an algorithm to detect errors in banks’ balance sheet statistical reporting," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4025-4059, December.
    3. Fabio Zambuto & Simona Arcuti & Roberto Sabatini & Daniele Zambuto, 2021. "Application of classification algorithms for the assessment of confirmation to quality remarks," Questioni di Economia e Finanza (Occasional Papers) 631, Bank of Italy, Economic Research and International Relations Area.
    4. Fabio Zambuto, 2021. "Quality checks on granular banking data: an experimental approach based on machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Micro data for the macro world, volume 53, Bank for International Settlements.
    5. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    6. Torben G. Andersen & Tim Bollerslev & Francis X. Diebold & Paul Labys, 1999. "The Distribution of Exchange Rate Volatility," New York University, Leonard N. Stern School Finance Department Working Paper Seires 99-059, New York University, Leonard N. Stern School of Business-.
    7. Weisheng Lu & Meng Ye & K.W. Chau & Roger Flanagan, 2018. "The paradoxical nexus between corporate social responsibility and sustainable financial performance: Evidence from the international construction business," Corporate Social Responsibility and Environmental Management, John Wiley & Sons, vol. 25(5), pages 844-852, September.
    8. Olushina O Awe & Robert Mudida & Luis A. Gil‐Alana, 2021. "Comparative analysis of economic growth in Nigeria and Kenya: A fractional integration approach," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 26(1), pages 1197-1205, January.
    9. Durgesh Samariya & Amit Thakkar, 2023. "A Comprehensive Survey of Anomaly Detection Algorithms," Annals of Data Science, Springer, vol. 10(3), pages 829-850, June.
    10. Dayong Zhang & Marco R. Barassi & Jijun Tan, 2015. "Residual-Based Tests for Fractional Cointegration: Testing the Term Structure of Interest Rates," Econometric Reviews, Taylor & Francis Journals, vol. 34(6-10), pages 1118-1140, December.
    11. Wesam Salah Alaloul & Muhammad Ali Musarat & Muhammad Babar Ali Rabbani & Qaiser Iqbal & Ahsen Maqsoom & Waqas Farooq, 2021. "Construction Sector Contribution to Economic Stability: Malaysian GDP Distribution," Sustainability, MDPI, vol. 13(9), pages 1-26, April.
    12. Kaposty, Florian & Kriebel, Johannes & Löderbusch, Matthias, 2020. "Predicting loss given default in leasing: A closer look at models and variable selection," International Journal of Forecasting, Elsevier, vol. 36(2), pages 248-266.
    13. Luis Gil-Alana, 2004. "Forecasting the real output using fractionally integrated techniques," Applied Economics, Taylor & Francis Journals, vol. 36(14), pages 1583-1589.
    14. Hallin, M. & van den Akker, R. & Werker, B.J.M., 2012. "Rank-based Tests of the Cointegrating Rank in Semiparametric Error Correction Models," Other publications TiSEM bc68a2f2-3ca3-443c-b3ac-f, Tilburg University, School of Economics and Management.
    15. Biqing Cai & Jiti Gao & Dag Tjøstheim, 2017. "A New Class of Bivariate Threshold Cointegration Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(2), pages 288-305, April.
    16. repec:kap:iaecre:v:17:y:2011:i:2:p:157-168 is not listed on IDEAS
    17. Frankel, David M., 2010. "Shocks and Crises in the Long Run," Staff General Research Papers Archive 31687, Iowa State University, Department of Economics.
    18. Nielsen, Morten Orregaard & Shimotsu, Katsumi, 2007. "Determining the cointegrating rank in nonstationary fractional systems by the exact local Whittle approach," Journal of Econometrics, Elsevier, vol. 141(2), pages 574-596, December.
    19. Andreas Stephan, 1997. "The Impact of Road Infrastructure on Productivity and Growth: Some Preliminary Results for the German Manufacturing Sector," CIG Working Papers FS IV 97-47, Wissenschaftszentrum Berlin (WZB), Research Unit: Competition and Innovation (CIG).
    20. Christophe Hurlin & Christophe Perignon & Sébastien Saurin, 2021. "The Fairness of Credit Scoring Models," Working Papers hal-03501452, HAL.
    21. Anderson, Heather M, 1997. "Transaction Costs and Non-linear Adjustment towards Equilibrium in the US Treasury Bill Market," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 59(4), pages 465-484, November.

    More about this item

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • G21 - Financial Economics - - Financial Institutions and Services - - - Banks; Other Depository Institutions; Micro Finance Institutions; Mortgages

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bis:bisifc:59-34. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Martin Fessler (email available below). General contact details of provider: https://edirc.repec.org/data/bisssch.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.