IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i16p3605-d1221274.html
   My bibliography  Save this article

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

Author

Listed:
  • Gayeong Eom

    (Department of Statistics, Inje University, Gimhae 50834, Republic of Korea)

  • Haewon Byeon

    (Department of Digital Anti-Aging Healthcare (BK21), Inje University, Gimhae 50834, Republic of Korea)

Abstract

Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging to the minority class are likely to be misclassified. It is particularly important to overcome this issue when dealing with medical data because class imbalance inevitably arises due to incidence rates within medical datasets. This study adjusted the imbalance ratio (IR) within the National Biobank of Korea dataset “Epidemiologic data of Parkinson’s disease dementia patients” to values of 6.8 (raw data), 9, and 19 and compared four traditional oversampling methods with techniques using the conditional generative adversarial network (CGAN) and conditional tabular generative adversarial network (CTGAN). The results showed that when the classes were balanced with CGAN and CTGAN, they showed a better classification performance than the more traditional oversampling techniques based on the AUC and F1-score. We were able to expand the application scope of GAN, widely used in unstructured data, to structured data. We also offer a better solution for the imbalanced data problem and suggest future research directions.

Suggested Citation

  • Gayeong Eom & Haewon Byeon, 2023. "Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique," Mathematics, MDPI, vol. 11(16), pages 1-14, August.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:16:p:3605-:d:1221274
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/16/3605/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/16/3605/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:16:p:3605-:d:1221274. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.