IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i5p1184-d1082960.html
   My bibliography  Save this article

Estimating Financial Fraud through Transaction-Level Features and Machine Learning

Author

Listed:
  • Ayed Alwadain

    (Computer Science Department, Community College, King Saud University, Riyadh 145111, Saudi Arabia)

  • Rao Faizan Ali

    (Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore 54400, Pakistan)

  • Amgad Muneer

    (Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
    Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, Malaysia)

Abstract

In today’s world, financial institutions (FIs) play a pivotal role in any country’s economic growth and are vital for intermediation between the providers of investable funds, such as depositors, investors and users. FIs focus on developing effective policies for financial fraud risk mitigation however, timely prediction of financial fraud risk helps overcome it effectively and efficiently. Thus, herein, we propose a novel approach for predicting financial fraud using machine learning. We have used transaction-level features of 6,362,620 transactions from a synthetic dataset and have fed them to various machine-learning classifiers. The correlation of different features is also analysed. Furthermore, around 5000 more data samples were generated using a Conditional Generative Adversarial Network for Tabular Data (CTGAN). The evaluation of the proposed predictor showed higher accuracies which outperformed the previously existing machine-learning-based approaches. Among all 27 classifiers, XGBoost outperformed all other classifiers in terms of accuracy score with 0.999 accuracies, however, when evaluated through exhaustive repeated 10-fold cross-validation, the XGBoost still gave an average accuracy score of 0.998. The findings are particularly relevant to financial institutions and are important for regulators and policymakers who aim to develop new and effective policies for risk mitigation against financial fraud.

Suggested Citation

  • Ayed Alwadain & Rao Faizan Ali & Amgad Muneer, 2023. "Estimating Financial Fraud through Transaction-Level Features and Machine Learning," Mathematics, MDPI, vol. 11(5), pages 1-15, February.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:5:p:1184-:d:1082960
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/5/1184/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/5/1184/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ahmed, Shamima & Alshater, Muneer M. & Ammari, Anis El & Hammami, Helmi, 2022. "Artificial intelligence and machine learning in finance: A bibliometric review," Research in International Business and Finance, Elsevier, vol. 61(C).
    2. Jiangang Hao & Tin Kam Ho, 2019. "Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language," Journal of Educational and Behavioral Statistics, , vol. 44(3), pages 348-361, June.
    3. Ilya Archakov & Peter Reinhard Hansen, 2021. "A New Parametrization of Correlation Matrices," Econometrica, Econometric Society, vol. 89(4), pages 1699-1715, July.
    4. D’Amato, Valeria & Levantesi, Susanna & Piscopo, Gabriella, 2022. "Deep learning in predicting cryptocurrency volatility," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 596(C).
    5. Kim, Ji-Hyun, 2009. "Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3735-3745, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alexey Ruchay & Elena Feldman & Dmitriy Cherbadzhi & Alexander Sokolov, 2023. "The Imbalanced Classification of Fraudulent Bank Transactions Using Machine Learning," Mathematics, MDPI, vol. 11(13), pages 1-15, June.
    2. Hugo Núñez Delafuente & César A. Astudillo & David Díaz, 2024. "Ensemble Approach Using k-Partitioned Isolation Forests for the Detection of Stock Market Manipulation," Mathematics, MDPI, vol. 12(9), pages 1-18, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joshua C. C. Chan, 2024. "BVARs and stochastic volatility," Chapters, in: Michael P. Clements & Ana Beatriz Galvão (ed.), Handbook of Research Methods and Applications in Macroeconomic Forecasting, chapter 3, pages 43-67, Edward Elgar Publishing.
    2. Airola, Antti & Pahikkala, Tapio & Waegeman, Willem & De Baets, Bernard & Salakoski, Tapio, 2011. "An experimental comparison of cross-validation techniques for estimating the area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1828-1844, April.
    3. HUO, Peng & WANG, Luxin, 2022. "Digital economy and business investment efficiency: Inhibiting or facilitating?," Research in International Business and Finance, Elsevier, vol. 63(C).
    4. Dilip B. Madan & King Wang, 2022. "Two sided efficient frontiers at multiple time horizons," Annals of Finance, Springer, vol. 18(3), pages 327-353, September.
    5. Ilya Archakov & Peter Reinhard Hansen & Yiyao Luo, 2024. "A new method for generating random correlation matrices," The Econometrics Journal, Royal Economic Society, vol. 27(2), pages 188-212.
    6. John J Nay & Yevgeniy Vorobeychik, 2016. "Predicting Human Cooperation," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-19, May.
    7. Kazim Topuz & Behrooz Davazdahemami & Dursun Delen, 2024. "A Bayesian belief network-based analytics methodology for early-stage risk detection of novel diseases," Annals of Operations Research, Springer, vol. 341(1), pages 673-697, October.
    8. Eska, Fabian E. & Shi, Yanghua & Theissen, Erik & Uhrig-Homburg, Marliese, 2024. "Do design features explain the volatility of cryptocurrencies?," Finance Research Letters, Elsevier, vol. 66(C).
    9. Matthew Tuson & Berwin Turlach & Kevin Murray & Mei Ruu Kok & Alistair Vickery & David Whyatt, 2021. "Predicting Future Geographic Hotspots of Potentially Preventable Hospitalisations Using All Subset Model Selection and Repeated K-Fold Cross-Validation," IJERPH, MDPI, vol. 18(19), pages 1-21, September.
    10. K. B. Gubbels & J. Y. Ypma & C. W. Oosterlee, 2023. "Principal Component Copulas for Capital Modelling and Systemic Risk," Papers 2312.13195, arXiv.org, revised Dec 2024.
    11. Hafner, Christian M. & Wang, Linqi, 2023. "A dynamic conditional score model for the log correlation matrix," Journal of Econometrics, Elsevier, vol. 237(2).
    12. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    13. Oliveira, Alexandre Silva de & Ceretta, Paulo Sergio & Albrecht, Peter, 2023. "Performance comparison of multifractal techniques and artificial neural networks in the construction of investment portfolios," Finance Research Letters, Elsevier, vol. 55(PA).
    14. Ehsan Harirchian & Tom Lahmer & Shahla Rasulzade, 2020. "Earthquake Hazard Safety Assessment of Existing Buildings Using Optimized Multi-Layer Perceptron Neural Network," Energies, MDPI, vol. 13(8), pages 1-16, April.
    15. Gonzalo Perez-de-la-Cruz & Guillermina Eslava-Gomez, 2019. "Discriminant analysis for discrete variables derived from a tree-structured graphical model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 855-876, December.
    16. Ilya Archakov & Peter Reinhard Hansen, 2024. "A Canonical Representation of Block Matrices with Applications to Covariance and Correlation Matrices," The Review of Economics and Statistics, MIT Press, vol. 106(4), pages 1099-1113, July.
    17. I. Charvet & A. Suppasri & H. Kimura & D. Sugawara & F. Imamura, 2015. "A multivariate generalized linear tsunami fragility model for Kesennuma City based on maximum flow depths, velocities and debris impact, with evaluation of predictive accuracy," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 79(3), pages 2073-2099, December.
    18. Qianying Jin & Kristiaan Kerstens & Ignace Van de Woestyne, 2024. "Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 46(4), pages 1213-1239, December.
    19. Elie Bouri & Afees A. Salisu & Rangan Gupta, 2023. "The predictive power of Bitcoin prices for the realized volatility of US stock sector returns," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 9(1), pages 1-22, December.
    20. Khan, Jafar A. & Van Aelst, Stefan & Zamar, Ruben H., 2010. "Fast robust estimation of prediction error based on resampling," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3121-3130, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:5:p:1184-:d:1082960. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.