IDEAS home Printed from https://ideas.repec.org/a/eee/apmaco/v350y2019icp434-446.html
   My bibliography  Save this article

Parallel extraction of association rules from genomics data

Author

Listed:
  • Agapito, Giuseppe
  • Guzzi, Pietro Hiram
  • Cannataro, Mario

Abstract

High-throughput experimental platforms like microarrays produce massive amounts of omics data for each analyzed sample. As an example, the Affymetrix DMET (Drug Metabolizing Enzymes and Transporters) microarray platform can discover Single Nucleotide Polymorphisms (SNPs) from 225 human genes involved in absorption, distribution, metabolism, and excretion (ADME) of drugs, enabling large pharmacogenomics studies. Moreover, the application of such platforms to large populations of subjects is further increasing the size of experimental datasets produced in clinical studies. Thus, the production of big omics datasets is a first reason to use parallel computing in bioinformatics. Such omics datasets are usually analyzed with classical statistical analysis and, more recently, by using data mining methods that can extract knowledge hidden in the data, e.g. by highlighting multiple associations among features of the data. However, the use of standard off-the-shelf data mining algorithms to large omic datasets, especially when considering association rule mining, poses two main issues: (i) huge requests of central memory that may prevent the execution of data mining software on personal/desktop computers; and (ii) very long response time, that may increase the time requested for completing extensive pharmacogenomics studies. To overcome the limits of standard association rule mining algorithms when applied to omics datasets, we propose PARES (Parallel Association Rules Extractor from SNPs), a novel parallel algorithm for the efficient extraction of association rules from omics datasets. PARES is implemented as a multi-thread version of an optimized version of the Frequent Pattern Growth (FP-Growth) algorithm. Moreover, it includes a customized SNPs datasets preprocessing strategy based on a Fisher’s Test Filter to discard the trivial transactions from the input dataset, reducing the search space from which to build many independent FP-Trees. The experimental results show that PARES has a good speedup and a high memory management efficiency, with respect to several association rule mining algorithms implemented in main off-the-shelf data mining platforms.

Suggested Citation

  • Agapito, Giuseppe & Guzzi, Pietro Hiram & Cannataro, Mario, 2019. "Parallel extraction of association rules from genomics data," Applied Mathematics and Computation, Elsevier, vol. 350(C), pages 434-446.
  • Handle: RePEc:eee:apmaco:v:350:y:2019:i:c:p:434-446
    DOI: 10.1016/j.amc.2017.09.026
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0096300317306471
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.amc.2017.09.026?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kuo, R.J. & Pai, C.M. & Lin, R.H. & Chu, H.C., 2015. "The integration of association rule mining and artificial immune network for supplier selection and order quantity allocation," Applied Mathematics and Computation, Elsevier, vol. 250(C), pages 958-972.
    2. Max Kotlyar & Igor Jurisica, 2006. "Predicting Protein-Protein Interactions by Association Mining," Information Systems Frontiers, Springer, vol. 8(1), pages 37-47, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shen-Tsu Wang, 2016. "Integrating grey sequencing with the genetic algorithm--immune algorithm to optimise touch panel cover glass polishing process parameter design," International Journal of Production Research, Taylor & Francis Journals, vol. 54(16), pages 4882-4893, August.
    2. Guanling Lee & Sheng-Lung Peng & Yuh-Tzu Lin, 2009. "Proportional fault-tolerant data mining with applications to bioinformatics," Information Systems Frontiers, Springer, vol. 11(4), pages 461-469, September.
    3. Yaqiong Lv & Shangjia Xiang & Tianyi Zhu & Shuzhu Zhang, 2020. "Data-Driven Design and Optimization for Smart Logistics Parks: Towards the Sustainable Development of the Steel Industry," Sustainability, MDPI, vol. 12(17), pages 1-12, August.
    4. Vincent S. Tseng & Hsieh-Hui Yu & Shih-Chiang Yang, 2009. "Efficient mining of multilevel gene association rules from microarray and gene ontology," Information Systems Frontiers, Springer, vol. 11(4), pages 433-447, September.
    5. Chulhwan Chris Bang, 2015. "Information systems frontiers: Keyword analysis and classification," Information Systems Frontiers, Springer, vol. 17(1), pages 217-237, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:apmaco:v:350:y:2019:i:c:p:434-446. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/applied-mathematics-and-computation .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.