IDEAS home Printed from https://ideas.repec.org/p/bdi/opques/qef_547_20.html
   My bibliography  Save this paper

Quality checks on granular banking data: an experimental approach based on machine learning?

Author

Listed:
  • Fabio Zambuto

    (Bank of Italy)

  • Maria Rosaria Buzzi

    (Bank of Italy)

  • Giuseppe Costanzo

    (Bank of Italy)

  • Marco Di Lucido

    (Bank of Italy)

  • Barbara La Ganga

    (Bank of Italy)

  • Pasquale Maddaloni

    (Bank of Italy)

  • Fabio Papale

    (Bank of Italy)

  • Emiliano Svezia

    (Bank of Italy)

Abstract

We propose a new methodology, based on machine learning algorithms, for the automatic detection of outliers in the data that banks report to the Bank of Italy. Our analysis focuses on granular data gathered within the statistical data collection on payment services, in which the lack of strong ex ante deterministic relationships among the collected variables makes standard diagnostic approaches less powerful. Quantile regression forests are used to derive a region of acceptance for the targeted information. For a given level of probability, plausibility thresholds are obtained on the basis of individual bank characteristics and are automatically updated as new data are reported. The approach was applied to validate semi-annual data on debit card issuance received from reporting agents between December 2016 and June 2018. The algorithm was trained with data reported in previous periods and tested by cross-checking the identified outliers with the reporting agents. The method made it possible to detect, with a high level of precision in term of false positives, new outliers that had not been detected using the standard procedures.

Suggested Citation

  • Fabio Zambuto & Maria Rosaria Buzzi & Giuseppe Costanzo & Marco Di Lucido & Barbara La Ganga & Pasquale Maddaloni & Fabio Papale & Emiliano Svezia, 2020. "Quality checks on granular banking data: an experimental approach based on machine learning?," Questioni di Economia e Finanza (Occasional Papers) 547, Bank of Italy, Economic Research and International Relations Area.
  • Handle: RePEc:bdi:opques:qef_547_20
    as

    Download full text from publisher

    File URL: https://www.bancaditalia.it/pubblicazioni/qef/2020-0547/QEF_547_20.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Roger Koenker, 2017. "Quantile Regression: 40 Years On," Annual Review of Economics, Annual Reviews, vol. 9(1), pages 155-176, September.
    2. Chakraborty, Chiranjit & Joseph, Andreas, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
    3. Farnè, Matteo & Vouldis, Angelos T., 2018. "A methodology for automised outlier detection in high-dimensional datasets: an application to euro area banks' supervisory data," Working Paper Series 2171, European Central Bank.
    4. Roger Koenker, 2017. "Quantile regression 40 years on," CeMMAP working papers CWP36/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    5. Koenker, Roger, 2004. "Quantile regression for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 91(1), pages 74-89, October.
    6. Tobias Cagala, 2017. "Improving data quality and closing data gaps with machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data needs and Statistics compilation for macroprudential analysis, volume 46, Bank for International Settlements.
    7. Roger Koenker & Kevin F. Hallock, 2001. "Quantile Regression," Journal of Economic Perspectives, American Economic Association, vol. 15(4), pages 143-156, Fall.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Davide Nicola Continanza & Andrea del Monaco & Marco di Lucido & Daniele Figoli & Pasquale Maddaloni & Filippo Quarta & Giuseppe Turturiello, 2023. "Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data science in central banking: applications and tools, volume 59, Bank for International Settlements.
    2. Vittoria La Serra & Emiliano Svezia, 2024. "A supervised record linkage approach for anomaly detection in insurance assets granular data," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(5), pages 4181-4205, October.
    3. Massimo Casa & Laura Graziani Palmieri & Laura Mellone & Francesca Monacelli, 2022. "The integrated approach adopted by Bank of Italy in the collection and production of credit and financial data," Questioni di Economia e Finanza (Occasional Papers) 667, Bank of Italy, Economic Research and International Relations Area.
    4. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2021. "Learning from revisions: a tool for detecting potential errors in banks' balance sheet statistical reporting," Questioni di Economia e Finanza (Occasional Papers) 611, Bank of Italy, Economic Research and International Relations Area.
    5. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2022. "Learning from revisions: an algorithm to detect errors in banks’ balance sheet statistical reporting," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4025-4059, December.
    6. Fabio Zambuto & Simona Arcuti & Roberto Sabatini & Daniele Zambuto, 2021. "Application of classification algorithms for the assessment of confirmation to quality remarks," Questioni di Economia e Finanza (Occasional Papers) 631, Bank of Italy, Economic Research and International Relations Area.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Damian Clarke & Manuel Llorca Jaña & Daniel Pailañir, 2023. "The use of quantile methods in economic history," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 56(2), pages 115-132, April.
    2. Holly Brannelly & Andrea Macrina & Gareth W. Peters, 2019. "Quantile Diffusions for Risk Analysis," Papers 1912.10866, arXiv.org, revised Sep 2021.
    3. Gareth W. Peters, 2018. "General Quantile Time Series Regressions for Applications in Population Demographics," Risks, MDPI, vol. 6(3), pages 1-47, September.
    4. Chen, Zhao & Cheng, Vivian Xinyi & Liu, Xu, 2024. "Reprint: Hypothesis testing on high dimensional quantile regression," Journal of Econometrics, Elsevier, vol. 239(2).
    5. Christian L. E. Franzke & Herminia Torelló i Sentelles, 2020. "Risk of extreme high fatalities due to weather and climate hazards and its connection to large-scale climate variability," Climatic Change, Springer, vol. 162(2), pages 507-525, September.
    6. Jayeeta Bhattacharya, 2020. "Quantile regression with generated dependent variable and covariates," Papers 2012.13614, arXiv.org.
    7. Franzke, Christian L.E., 2021. "Towards the development of economic damage functions for weather and climate extremes," Ecological Economics, Elsevier, vol. 189(C).
    8. D Barrera & S Crépey & E Gobet & Hoang-Dung Nguyen & B Saadeddine, 2024. "Statistical Learning of Value-at-Risk and Expected Shortfall," Working Papers hal-03775901, HAL.
    9. Francisco J. Delgado, 2021. "On the Determinants of Fiscal Decentralization: Evidence From the EU," The AMFITEATRU ECONOMIC journal, Academy of Economic Studies - Bucharest, Romania, vol. 23(56), pages 206-206, February.
    10. Ruofan Xu & Jiti Gao & Tatsushi Oka & Yoon-Jae Whang, 2022. "Quantile Random-Coefficient Regression with Interactive Fixed Effects: Heterogeneous Group-Level Policy Evaluation," Papers 2208.03632, arXiv.org, revised Nov 2024.
    11. Petrella, Lea & Raponi, Valentina, 2019. "Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 70-84.
    12. Lihua Lei & Emmanuel J. Candès, 2021. "Conformal inference of counterfactuals and individual treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 911-938, November.
    13. Tao Hu & Baosheng Liang, 2021. "A New Class of Estimators Based on a General Relative Loss Function," Mathematics, MDPI, vol. 9(10), pages 1-19, May.
    14. Chen, Zhao & Cheng, Vivian Xinyi & Liu, Xu, 2024. "Hypothesis testing on high dimensional quantile regression," Journal of Econometrics, Elsevier, vol. 238(1).
    15. Ruofan Xu & Jiti Gao & Tatsushi Oka & Yoon-Jae Whang, 2022. "Estimation of Heterogeneous Treatment Effects Using Quantile Regression with Interactive Fixed Effects," Monash Econometrics and Business Statistics Working Papers 13/22, Monash University, Department of Econometrics and Business Statistics.
    16. D Barrera & S Cr'epey & E Gobet & Hoang-Dung Nguyen & B Saadeddine, 2022. "Statistical Learning of Value-at-Risk and Expected Shortfall," Papers 2209.06476, arXiv.org, revised Sep 2024.
    17. Maximilian Buchholz & Harald Bathelt & John A. Cantwell, 2020. "Income divergence and global connectivity of U.S. urban regions," Journal of International Business Policy, Palgrave Macmillan, vol. 3(3), pages 229-248, September.
    18. Maximilian Buchholz & Harald Bathelt & John A. Cantwell, 0. "Income divergence and global connectivity of U.S. urban regions," Journal of International Business Policy, Palgrave Macmillan, vol. 0, pages 1-20.
    19. Abhinava Tripathi, 2021. "The Arrival of Information and Price Adjustment Across Extreme Quantiles: Global Evidence," IIM Kozhikode Society & Management Review, , vol. 10(1), pages 7-19, January.
    20. Davide Nicola Continanza & Andrea del Monaco & Marco di Lucido & Daniele Figoli & Pasquale Maddaloni & Filippo Quarta & Giuseppe Turturiello, 2023. "Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data science in central banking: applications and tools, volume 59, Bank for International Settlements.

    More about this item

    Keywords

    banking data; data quality management; outlier detection; machine learning; quantile regression; random forests;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • G21 - Financial Economics - - Financial Institutions and Services - - - Banks; Other Depository Institutions; Micro Finance Institutions; Mortgages

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bdi:opques:qef_547_20. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/bdigvit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.