IDEAS home Printed from https://ideas.repec.org/a/kap/compec/v54y2019i3d10.1007_s10614-018-9864-z.html
   My bibliography  Save this article

Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection

Author

Listed:
  • Yan Zhang

    (Office of the Comptroller of the Currency)

  • Peter Trubey

    (University of California Santa Cruz)

Abstract

This paper studies the interplay of machine learning and sampling scheme in an empirical analysis of money laundering detection algorithms. Using actual transaction data provided by a U.S. financial institution, we study five major machine learning algorithms including Bayes logistic regression, decision tree, random forest, support vector machine, and artificial neural network. As the incidence of money laundering events is rare, we apply and compare two sampling techniques that increase the relative presence of the events. Our analysis reveals potential advantages of machine learning algorithms in modeling money laundering events. This paper provides insights into the use of machine learning and sampling schemes in money laundering detection specifically, and classification of rare events in general.

Suggested Citation

  • Yan Zhang & Peter Trubey, 2019. "Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection," Computational Economics, Springer;Society for Computational Economics, vol. 54(3), pages 1043-1063, October.
  • Handle: RePEc:kap:compec:v:54:y:2019:i:3:d:10.1007_s10614-018-9864-z
    DOI: 10.1007/s10614-018-9864-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10614-018-9864-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10614-018-9864-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kar Yan Tam & Melody Y. Kiang, 1992. "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions," Management Science, INFORMS, vol. 38(7), pages 926-947, July.
    2. G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
    3. Mark Cecchini & Haldun Aytug & Gary J. Koehler & Praveen Pathak, 2010. "Detecting Management Fraud in Public Companies," Management Science, INFORMS, vol. 56(7), pages 1146-1160, July.
    4. Butaru, Florentin & Chen, Qingqing & Clark, Brian & Das, Sanmay & Lo, Andrew W. & Siddique, Akhtar, 2016. "Risk and risk management in the credit card industry," Journal of Banking & Finance, Elsevier, vol. 72(C), pages 218-239.
    5. Khandani, Amir E. & Kim, Adlar J. & Lo, Andrew W., 2010. "Consumer credit-risk models via machine-learning algorithms," Journal of Banking & Finance, Elsevier, vol. 34(11), pages 2767-2787, November.
    6. Altman, Edward I. & Marco, Giancarlo & Varetto, Franco, 1994. "Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience)," Journal of Banking & Finance, Elsevier, vol. 18(3), pages 505-529, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Petra Posedel v{S}imovi'c & Davor Horvatic & Edward W. Sun, 2021. "Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression," Papers 2105.07671, arXiv.org, revised Jul 2021.
    2. Petra P. Šimović & Claire Y. T. Chen & Edward W. Sun, 2023. "Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression," Computational Economics, Springer;Society for Computational Economics, vol. 61(1), pages 451-485, January.
    3. Chen, Jian & Katchova, Ani L. & Zhou, Chenxi, 2021. "Agricultural loan delinquency prediction using machine learning methods," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 24(5), May.
    4. Alonso-Robisco, Andrés & Carbó, José Manuel, 2022. "Can machine learning models save capital for banks? Evidence from a Spanish credit portfolio," International Review of Financial Analysis, Elsevier, vol. 84(C).
    5. Königstorfer, Florian & Thalmann, Stefan, 2020. "Applications of Artificial Intelligence in commercial banks – A research agenda for behavioral finance," Journal of Behavioral and Experimental Finance, Elsevier, vol. 27(C).
    6. Abbas Haider & Hui Wang & Bryan Scotney & Glenn Hawe, 2022. "Predictive Market Making via Machine Learning," SN Operations Research Forum, Springer, vol. 3(1), pages 1-21, March.
    7. Zanin, Luca, 2020. "Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 25(C).
    8. Ajitha Kumari Vijayappan Nair Biju & Ann Susan Thomas & J Thasneem, 2024. "Examining the research taxonomy of artificial intelligence, deep learning & machine learning in the financial sphere—a bibliometric analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 849-878, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wosnitza, Jan Henrik, 2022. "Calibration alternatives to logistic regression and their potential for transferring the dispersion of discriminatory power into uncertainties of probabilities of default," Discussion Papers 04/2022, Deutsche Bundesbank.
    2. Zhou, Fanyin & Fu, Lijun & Li, Zhiyong & Xu, Jiawei, 2022. "The recurrence of financial distress: A survival analysis," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1100-1115.
    3. Beynon, Malcolm J. & Peel, Michael J., 2001. "Variable precision rough set theory and data discretisation: an application to corporate failure prediction," Omega, Elsevier, vol. 29(6), pages 561-576, December.
    4. Haider A. Khan, 2004. "General Conclusions: From Crisis to a Global Political Economy of Freedom," Palgrave Macmillan Books, in: Global Markets and Financial Crises in Asia, chapter 9, pages 193-211, Palgrave Macmillan.
    5. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    6. Tseng, Chih-Hsiung & Cheng, Sheng-Tzong & Wang, Yi-Hsien & Peng, Jin-Tang, 2008. "Artificial neural network model of the hybrid EGARCH volatility of the Taiwan stock index option prices," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(13), pages 3192-3200.
    7. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    8. du Jardin, Philippe & Séverin, Eric, 2011. "Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting horizon of a financial failure model," MPRA Paper 44262, University Library of Munich, Germany.
    9. repec:hum:wpaper:sfb649dp2013-037 is not listed on IDEAS
    10. Mark T. Leung & An-Sing Chen, 2005. "Performance evaluation of neural network architectures: the case of predicting foreign exchange correlations," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 24(6), pages 403-420.
    11. Philippe Paquet, 1997. "L'utilisation des réseaux de neurones artificiels en finance," Working Papers 1997-1, Laboratoire Orléanais de Gestion - université d'Orléans.
    12. Steven Heston & Nitish R. Sinha, 2016. "News versus Sentiment : Predicting Stock Returns from News Stories," Finance and Economics Discussion Series 2016-048, Board of Governors of the Federal Reserve System (U.S.).
    13. Wolfgang Härdle & Yuh-Jye Lee & Dorothea Schäfer & Yi-Ren Yeh, 2009. "Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 28(6), pages 512-534.
    14. Jones, Stewart & Johnstone, David & Wilson, Roy, 2015. "An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes," Journal of Banking & Finance, Elsevier, vol. 56(C), pages 72-85.
    15. Fayçal Mraihi, 2016. "Distressed Company Prediction Using Logistic Regression: Tunisian’s Case," Quarterly Journal of Business Studies, Research Academy of Social Sciences, vol. 2(1), pages 34-54.
    16. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    17. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Are post-crisis statistical initiatives completed?, volume 49, Bank for International Settlements.
    18. Nakagawa, Kei & Sakemoto, Ryuta, 2022. "Cryptocurrency network factors and gold," Finance Research Letters, Elsevier, vol. 46(PB).
    19. Greta Falavigna, 2008. "Nouveaux instruments d’évaluation pour le risque financier d’entreprise," CERIS Working Paper 200801, CNR-IRCrES Research Institute on Sustainable Economic Growth - Torino (TO) ITALY - former Institute for Economic Research on Firms and Growth - Moncalieri (TO) ITALY.
    20. Haskamp, Ulrich, 2017. "Improving the forecasts of European regional banks' profitability with machine learning algorithms," Ruhr Economic Papers 705, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
    21. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan-level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), The use of big data analytics and artificial intelligence in central banking, volume 50, Bank for International Settlements.

    More about this item

    Keywords

    Bootstrap; Machine learning; Money laundering; Rare event; Sampling scheme;
    All these keywords.

    JEL classification:

    • G21 - Financial Economics - - Financial Institutions and Services - - - Banks; Other Depository Institutions; Micro Finance Institutions; Mortgages
    • G28 - Financial Economics - - Financial Institutions and Services - - - Government Policy and Regulation

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:compec:v:54:y:2019:i:3:d:10.1007_s10614-018-9864-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.