IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i11p1709-d1405769.html
   My bibliography  Save this article

Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation

Author

Listed:
  • Ying Zhang

    (School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China)

  • Li Deng

    (School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China)

  • Bo Wei

    (School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
    Longgang Research Institute, Zhejiang Sci-Tech University, Longgang 325000, China)

Abstract

Oversampling techniques are widely used to rebalance imbalanced datasets. However, most of the oversampling methods may introduce noise and fuzzy boundaries for dataset classification, leading to the overfitting phenomenon. To solve this problem, we propose a new method (FSDR-SMOTE) based on Random-SMOTE and Feature Standard Deviation for rebalancing imbalanced datasets. The method first removes noisy samples based on the Tukey criterion and then calculates the feature standard deviation reflecting the degree of data discretization to detect the sample location, and classifies the samples into boundary samples and safety samples. Secondly, the K-means clustering algorithm is employed to partition the minority class samples into several sub-clusters. Within each sub-cluster, new samples are generated based on random samples, boundary samples, and the corresponding sub-cluster center. The experimental results show that the average evaluation value obtained by FSDR-SMOTE is 93.31% (93.16%, and 86.53%) in terms of the F-measure (G-mean, and MCC) on the 20 benchmark datasets selected from the UCI machine learning library.

Suggested Citation

  • Ying Zhang & Li Deng & Bo Wei, 2024. "Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation," Mathematics, MDPI, vol. 12(11), pages 1-17, May.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:11:p:1709-:d:1405769
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/11/1709/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/11/1709/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Huyghues-Beaufond, Nathalie & Tindemans, Simon & Falugi, Paola & Sun, Mingyang & Strbac, Goran, 2020. "Robust and automatic data cleansing method for short-term load forecasting of distribution feeders," Applied Energy, Elsevier, vol. 261(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Türkoğlu, A. Selim & Erkmen, Burcu & Eren, Yavuz & Erdinç, Ozan & Küçükdemiral, İbrahim, 2024. "Integrated Approaches in Resilient Hierarchical Load Forecasting via TCN and Optimal Valley Filling Based Demand Response Application," Applied Energy, Elsevier, vol. 360(C).
    2. Luo, Tengqi & Xuan, Ang & Wang, Yafei & Li, Guanglei & Fang, Juan & Liu, Zhengguang, 2023. "Energy efficiency evaluation and optimization of active distribution networks with building integrated photovoltaic systems," Renewable Energy, Elsevier, vol. 219(P1).
    3. Jeong, Dongyeon & Park, Chiwoo & Ko, Young Myoung, 2021. "Missing data imputation using mixture factor analysis for building electric load data," Applied Energy, Elsevier, vol. 304(C).
    4. Hafeez, Ghulam & Alimgeer, Khurram Saleem & Khan, Imran, 2020. "Electric load forecasting based on deep learning and optimized by heuristic algorithm in smart grid," Applied Energy, Elsevier, vol. 269(C).
    5. Li, Chen, 2020. "Designing a short-term load forecasting model in the urban smart grid system," Applied Energy, Elsevier, vol. 266(C).
    6. Haben, Stephen & Arora, Siddharth & Giasemidis, Georgios & Voss, Marcus & Vukadinović Greetham, Danica, 2021. "Review of low voltage load forecasting: Methods, applications, and recommendations," Applied Energy, Elsevier, vol. 304(C).
    7. Laouafi, Abderrezak & Laouafi, Farida & Boukelia, Taqiy Eddine, 2022. "An adaptive hybrid ensemble with pattern similarity analysis and error correction for short-term load forecasting," Applied Energy, Elsevier, vol. 322(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:11:p:1709-:d:1405769. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.