IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003627.html
   My bibliography  Save this article

AprioriGWAS, a New Pattern Mining Strategy for Detecting Genetic Variants Associated with Disease through Interaction Effects

Author

Listed:
  • Qingrun Zhang
  • Quan Long
  • Jurg Ott

Abstract

Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term “glycosaminoglycan biosynthetic process” was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.Author Summary: Genes do not operate in vacuum. They interact with each other in many ways. Therefore, to figure out genetic causes of disease by case-control association studies, it is important to take interactions into account. There are two fundamental challenges in interaction-focused analysis. The first is the number of possible combinations of genetic variants easily goes to astronomic which is beyond current computational facility, which is referred as “the curse of dimensionality” in field of computer science. The other is, even if all potential combinations could be exhaustively checked, genuine signals are likely to be buried by false positives that are composed of single variant with large main effect and some other irrelevant variant. In this work, we propose AprioriGWAS that employees Apriori, an algorithm that pioneers the branch of “Frequent Itemset Mining” in computer science to cope with daunting numbers of combinations, and conditional permutation, to enable real signals standing out. By applying AprioriGWAS to age-related macular degeneration (AMD) data and bipolar disorder (BD) in WTCCC data, we found interesting interactions between sensible genes in terms of disease. Consequently, AprioriGWAS could be a good tool to find epistasis interaction from GWA data.

Suggested Citation

  • Qingrun Zhang & Quan Long & Jurg Ott, 2014. "AprioriGWAS, a New Pattern Mining Strategy for Detecting Genetic Variants Associated with Disease through Interaction Effects," PLOS Computational Biology, Public Library of Science, vol. 10(6), pages 1-14, June.
  • Handle: RePEc:plo:pcbi00:1003627
    DOI: 10.1371/journal.pcbi.1003627
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003627
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003627&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003627?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Xuesen Wu & Hua Dong & Li Luo & Yun Zhu & Gang Peng & John D Reveille & Momiao Xiong, 2010. "A Novel Statistic for Genome-Wide Interaction Analysis," PLOS Genetics, Public Library of Science, vol. 6(9), pages 1-15, September.
    2. Wanwan Tang & Xuebing Wu & Rui Jiang & Yanda Li, 2009. "Epistatic Module Detection for Case-Control Studies: A Bayesian Model with a Gibbs Sampling Strategy," PLOS Genetics, Public Library of Science, vol. 5(5), pages 1-18, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shouheng Tuo & Junying Zhang & Xiguo Yuan & Yuanyuan Zhang & Zhaowen Liu, 2016. "FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-27, March.
    2. Iksoo Huh & Min-Seok Kwon & Taesung Park, 2015. "An Efficient Stepwise Statistical Test to Identify Multiple Linked Human Genetic Variants Associated with Specific Phenotypic Traits," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-13, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shouheng Tuo & Junying Zhang & Xiguo Yuan & Yuanyuan Zhang & Zhaowen Liu, 2016. "FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-27, March.
    2. Masao Ueki & Heather J Cordell, 2012. "Improved Statistics for Genome-Wide Interaction Analysis," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-19, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003627. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.