IDEAS home Printed from https://ideas.repec.org/a/wsi/ijitdm/v20y2021i02ns0219622021500188.html
   My bibliography  Save this article

Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model

Author

Listed:
  • Kumar Rahul

    (Department of Basic and Applied Science, NIFTEM, Sonipat 131028, India)

  • Rohitash Kumar Banyal

    (#x2020;Department of Computer Science and Engineering, Rajasthan Technical University, Kota, 324010, India)

Abstract

Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.

Suggested Citation

  • Kumar Rahul & Rohitash Kumar Banyal, 2021. "Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 20(02), pages 809-841, March.
  • Handle: RePEc:wsi:ijitdm:v:20:y:2021:i:02:n:s0219622021500188
    DOI: 10.1142/S0219622021500188
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0219622021500188
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219622021500188?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yao, Qingtao & Zhu, Haowei & Xiang, Ling & Su, Hao & Hu, Aijun, 2023. "A novel composed method of cleaning anomy data for improving state prediction of wind turbine," Renewable Energy, Elsevier, vol. 204(C), pages 131-140.
    2. Li, Shuangqi & He, Hongwen & Zhao, Pengfei & Cheng, Shuang, 2022. "Data cleaning and restoring method for vehicle battery big data platform," Applied Energy, Elsevier, vol. 320(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:ijitdm:v:20:y:2021:i:02:n:s0219622021500188. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/ijitdm/ijitdm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.