IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i5p1184-d1082960.html
   My bibliography  Save this article

Estimating Financial Fraud through Transaction-Level Features and Machine Learning

Author

Listed:
  • Ayed Alwadain

    (Computer Science Department, Community College, King Saud University, Riyadh 145111, Saudi Arabia)

  • Rao Faizan Ali

    (Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore 54400, Pakistan)

  • Amgad Muneer

    (Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
    Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, Malaysia)

Abstract

In today’s world, financial institutions (FIs) play a pivotal role in any country’s economic growth and are vital for intermediation between the providers of investable funds, such as depositors, investors and users. FIs focus on developing effective policies for financial fraud risk mitigation however, timely prediction of financial fraud risk helps overcome it effectively and efficiently. Thus, herein, we propose a novel approach for predicting financial fraud using machine learning. We have used transaction-level features of 6,362,620 transactions from a synthetic dataset and have fed them to various machine-learning classifiers. The correlation of different features is also analysed. Furthermore, around 5000 more data samples were generated using a Conditional Generative Adversarial Network for Tabular Data (CTGAN). The evaluation of the proposed predictor showed higher accuracies which outperformed the previously existing machine-learning-based approaches. Among all 27 classifiers, XGBoost outperformed all other classifiers in terms of accuracy score with 0.999 accuracies, however, when evaluated through exhaustive repeated 10-fold cross-validation, the XGBoost still gave an average accuracy score of 0.998. The findings are particularly relevant to financial institutions and are important for regulators and policymakers who aim to develop new and effective policies for risk mitigation against financial fraud.

Suggested Citation

  • Ayed Alwadain & Rao Faizan Ali & Amgad Muneer, 2023. "Estimating Financial Fraud through Transaction-Level Features and Machine Learning," Mathematics, MDPI, vol. 11(5), pages 1-15, February.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:5:p:1184-:d:1082960
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/5/1184/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/5/1184/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jiangang Hao & Tin Kam Ho, 2019. "Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language," Journal of Educational and Behavioral Statistics, , vol. 44(3), pages 348-361, June.
    2. Ahmed, Shamima & Alshater, Muneer M. & Ammari, Anis El & Hammami, Helmi, 2022. "Artificial intelligence and machine learning in finance: A bibliometric review," Research in International Business and Finance, Elsevier, vol. 61(C).
    3. Ilya Archakov & Peter Reinhard Hansen, 2021. "A New Parametrization of Correlation Matrices," Econometrica, Econometric Society, vol. 89(4), pages 1699-1715, July.
    4. Kim, Ji-Hyun, 2009. "Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3735-3745, September.
    5. D’Amato, Valeria & Levantesi, Susanna & Piscopo, Gabriella, 2022. "Deep learning in predicting cryptocurrency volatility," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 596(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alexey Ruchay & Elena Feldman & Dmitriy Cherbadzhi & Alexander Sokolov, 2023. "The Imbalanced Classification of Fraudulent Bank Transactions Using Machine Learning," Mathematics, MDPI, vol. 11(13), pages 1-15, June.
    2. Hugo Núñez Delafuente & César A. Astudillo & David Díaz, 2024. "Ensemble Approach Using k-Partitioned Isolation Forests for the Detection of Stock Market Manipulation," Mathematics, MDPI, vol. 12(9), pages 1-18, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen Tong & Peter Reinhard Hansen & Ilya Archakov, 2024. "Cluster GARCH," Papers 2406.06860, arXiv.org.
    2. González, Marta Ramos & Ureña, Antonio Partal & Fernández-Aguado, Pilar Gómez, 2023. "Forecasting for regulatory credit loss derived from the COVID-19 pandemic: A machine learning approach," Research in International Business and Finance, Elsevier, vol. 64(C).
    3. Mark G E White & Neil E Bezodis & Jonathon Neville & Huw Summers & Paul Rees, 2022. "Determining jumping performance from a single body-worn accelerometer using machine learning," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-25, February.
    4. Richard A. Johansen & Molly K. Reif & Christina L. Saltus & Kaytee L. Pokrzywinski, 2024. "A Broadscale Assessment of Sentinel-2 Imagery and the Google Earth Engine for the Nationwide Mapping of Chlorophyll a," Sustainability, MDPI, vol. 16(5), pages 1-17, March.
    5. Muideen Adegoke & Alaka Hafiz & Saheed Ajayi & Razak Olu-Ajayi, 2022. "Application of Multilayer Extreme Learning Machine for Efficient Building Energy Prediction," Energies, MDPI, vol. 15(24), pages 1-21, December.
    6. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.
    7. Airola, Antti & Pahikkala, Tapio & Waegeman, Willem & De Baets, Bernard & Salakoski, Tapio, 2011. "An experimental comparison of cross-validation techniques for estimating the area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1828-1844, April.
    8. HUO, Peng & WANG, Luxin, 2022. "Digital economy and business investment efficiency: Inhibiting or facilitating?," Research in International Business and Finance, Elsevier, vol. 63(C).
    9. Dilip B. Madan & King Wang, 2022. "Two sided efficient frontiers at multiple time horizons," Annals of Finance, Springer, vol. 18(3), pages 327-353, September.
    10. Ilya Archakov & Peter Reinhard Hansen & Yiyao Luo, 2024. "A new method for generating random correlation matrices," The Econometrics Journal, Royal Economic Society, vol. 27(2), pages 188-212.
    11. Matthias Schmid & Thomas Hielscher & Thomas Augustin & Olaf Gefeller, 2011. "A Robust Alternative to the Schemper–Henderson Estimator of Prediction Error," Biometrics, The International Biometric Society, vol. 67(2), pages 524-535, June.
    12. Luts, Jan & Ormerod, John T., 2014. "Mean field variational Bayesian inference for support vector machine classification," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 163-176.
    13. David Rios Insua & Roi Naveiro & Victor Gallego, 2020. "Perspectives on Adversarial Classification," Mathematics, MDPI, vol. 8(11), pages 1-21, November.
    14. Elie Bouri & Afees A. Salisu & Rangan Gupta, 2022. "Bitcoin Prices and the Realized Volatility of US Sectoral Stock Returns," Working Papers 202224, University of Pretoria, Department of Economics.
    15. John J Nay & Yevgeniy Vorobeychik, 2016. "Predicting Human Cooperation," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-19, May.
    16. Jingyang Wu & Xinyi Zhang & Fangyixuan Huang & Haochen Zhou & Rohtiash Chandra, 2024. "Review of deep learning models for crypto price prediction: implementation and evaluation," Papers 2405.11431, arXiv.org, revised Jun 2024.
    17. Taleh Agasiev & Anatoly Karpenko, 2024. "Exploratory Landscape Validation for Bayesian Optimization Algorithms," Mathematics, MDPI, vol. 12(3), pages 1-21, January.
    18. Matthew Tuson & Berwin Turlach & Kevin Murray & Mei Ruu Kok & Alistair Vickery & David Whyatt, 2021. "Predicting Future Geographic Hotspots of Potentially Preventable Hospitalisations Using All Subset Model Selection and Repeated K-Fold Cross-Validation," IJERPH, MDPI, vol. 18(19), pages 1-21, September.
    19. K. B. Gubbels & J. Y. Ypma & C. W. Oosterlee, 2023. "Principal Component Copulas for Capital Modelling," Papers 2312.13195, arXiv.org.
    20. Hafner, Christian M. & Wang, Linqi, 2023. "A dynamic conditional score model for the log correlation matrix," Journal of Econometrics, Elsevier, vol. 237(2).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:5:p:1184-:d:1082960. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.