IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i24p3930-d1543309.html
   My bibliography  Save this article

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Author

Listed:
  • Mohsin Shaikh

    (Department of Computer Science, The University of Larkano, Larkana 77062, Pakistan)

  • Sabina Akram

    (Department of Computer Science and Engineering, Fast National University, Islamabad 44000, Pakistan)

  • Jawad Khan

    (School of Computing, Gachon University, Seongnam 13120, Republic of Korea)

  • Shah Khalid

    (School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan)

  • Youngmoon Lee

    (Department of Robotics, Hanyang University, Ansan 15588, Republic of Korea)

Abstract

Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.

Suggested Citation

  • Mohsin Shaikh & Sabina Akram & Jawad Khan & Shah Khalid & Youngmoon Lee, 2024. "DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining," Mathematics, MDPI, vol. 12(24), pages 1-29, December.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/24/3930/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/24/3930/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mohamed Reda Al-Bana & Marwa Salah Farhan & Nermin Abdelhakim Othman, 2022. "An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data," Data, MDPI, vol. 7(1), pages 1-22, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:24:p:3930-:d:1543309. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.