IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1009853.html
   My bibliography  Save this article

Leveraging auxiliary data from arbitrary distributions to boost GWAS discovery with Flexible cFDR

Author

Listed:
  • Anna Hutchinson
  • Guillermo Reales
  • Thomas Willis
  • Chris Wallace

Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with complex traits. However, a stringent significance threshold is required to identify robust genetic associations. Leveraging relevant auxiliary covariates has the potential to boost statistical power to exceed the significance threshold. Particularly, abundant pleiotropy and the non-random distribution of SNPs across various functional categories suggests that leveraging GWAS test statistics from related traits and/or functional genomic data may boost GWAS discovery. While type 1 error rate control has become standard in GWAS, control of the false discovery rate can be a more powerful approach. The conditional false discovery rate (cFDR) extends the standard FDR framework by conditioning on auxiliary data to call significant associations, but current implementations are restricted to auxiliary data satisfying specific parametric distributions, typically GWAS p-values for related traits. We relax these distributional assumptions, enabling an extension of the cFDR framework that supports auxiliary covariates from arbitrary continuous distributions (“Flexible cFDR”). Our method can be applied iteratively, thereby supporting multi-dimensional covariate data. Through simulations we show that Flexible cFDR increases sensitivity whilst controlling FDR after one or several iterations. We further demonstrate its practical potential through application to an asthma GWAS, leveraging various functional genomic data to find additional genetic associations for asthma, which we validate in the larger, independent, UK Biobank data resource.Author summary: Genome-wide association studies (GWAS) detect regions of the human genome that are associated with various traits, including complex diseases, but the power to detect these genomic regions is currently limited by sample size. The conditional false discovery rate (cFDR) provides a tool to leverage one GWAS study to improve power in another. The motivation is that if two traits have some genetic correlation, then our interpretation of a low but not significant p-value for the trait of interest will differ depending on whether that SNP shows strong or absent evidence of association with the related trait. Here, we describe an extension to the cFDR framework, called “Flexible cFDR”, that controls the FDR and supports auxiliary data from arbitrary distributions, surpassing current implementations of cFDR which are restricted to leveraging GWAS p-values from related traits. In practice, our method can be used to iteratively leverage various types of functional genomic data with GWAS data to increase power for GWAS discovery. We describe the use of Flexible cFDR to supplement data from a GWAS of asthma with auxiliary data from functional genomic experiments. We identify associations novel to the original GWAS and validate these discoveries with reference to a larger, more highly-powered GWAS of asthma.

Suggested Citation

  • Anna Hutchinson & Guillermo Reales & Thomas Willis & Chris Wallace, 2021. "Leveraging auxiliary data from arbitrary distributions to boost GWAS discovery with Flexible cFDR," PLOS Genetics, Public Library of Science, vol. 17(10), pages 1-37, October.
  • Handle: RePEc:plo:pgen00:1009853
    DOI: 10.1371/journal.pgen.1009853
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009853
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1009853&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1009853?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1009853. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.