Author
Listed:
- Masad A. Alrasheedi
(Department of Management Information Systems, College of Business Administration, Taibah University, Al-Madinah Al-Munawara 42353, Saudi Arabia)
- Samia Ijaz
(Department of Computer Science, HITEC University, Taxila 47080, Pakistan)
- Ayed M. Alrashdi
(Department of Electrical Engineering, College of Engineering, University of Ha’il, Ha’il 81441, Saudi Arabia)
- Seung-Won Lee
(Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea
Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea
Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea
Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea)
Abstract
The world prevalence of the two types of authorized and fraudulent transactions makes it difficult to distinguish between the two operations. The small percentage of fraudulent transactions, in turn, gives rise to the class imbalance problem. Hence, an adequately robust fraud detection mechanism must exist for tax systems to avoid their collapse. It has become significantly difficult to obtain any dataset, specifically a tax return dataset, because of the rising importance of privacy in a society where people generally feel squeamish about sharing personal information. Because of this, we arrive at the decision to synthesize our dataset by employing publicly available data, as well as enhance them through Correlational Generative Adversarial Networks (CGANs) and the Synthetic Minority Oversampling Technique (SMOTE). The proposed method includes a preprocessing stage to denoise the data and identify anomalies, outliers, and dimensionality reduction. Then the data have undergone enhancement using the SMOTE and the proposed CGAN techniques. A unique encoder design has been proposed, which serves the purpose of exposing the hidden patterns among legitimate and fraudulent records. This research found anomalous deductions, income inconsistencies, recurrent transaction manipulations, and irregular filing practices that distinguish fraudulent from valid tax records. These patterns are identified by encoder-based feature extraction and synthetic data augmentation. Several machine learning classifiers, along with a voting ensemble technique, have been used both with and without data augmentation. Experimental results have shown that the proposed Soft-Voting technique outperformed the original without an ensemble method.
Suggested Citation
Masad A. Alrasheedi & Samia Ijaz & Ayed M. Alrashdi & Seung-Won Lee, 2025.
"Advanced Tax Fraud Detection: A Soft-Voting Ensemble Based on GAN and Encoder Architecture,"
Mathematics, MDPI, vol. 13(4), pages 1-29, February.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:4:p:642-:d:1592301
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:4:p:642-:d:1592301. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.