IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1007591.html
   My bibliography  Save this article

ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls

Author

Listed:
  • Audrey E Hendricks
  • Stephen C Billups
  • Hamish N C Pike
  • I Sadaf Farooqi
  • Eleftheria Zeggini
  • Stephanie A Santorico
  • Inês Barroso
  • Josée Dupuis

Abstract

A primary goal of the recent investment in sequencing is to detect novel genetic associations in health and disease improving the development of treatments and playing a critical role in precision medicine. While this investment has resulted in an enormous total number of sequenced genomes, individual studies of complex traits and diseases are often smaller and underpowered to detect rare variant genetic associations. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in these studies. Fully utilizing these and other existing sequencing resources may increase power and could be especially useful in studies where resources to sequence additional samples are limited. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in large part due to the lack of statistical methods that can appropriately use this summary level data. Here, we present a new method to incorporate external controls in case-control analysis called ProxECAT (Proxy External Controls Association Test). ProxECAT estimates enrichment of rare variants within a gene region using internally sequenced cases and external controls. We evaluated ProxECAT in simulations and empirical analyses of obesity cases using both low-depth of coverage (7x) whole-genome sequenced controls and ExAC as controls. We find that ProxECAT maintains the expected type I error rate with increased power as the number of external controls increases. With an accompanying R package, ProxECAT enables the use of publicly available allele frequencies as external controls in case-control analysis.Author summary: Recent investments have produced sequence data on millions of people with the number of sequenced individuals continuing to grow. Although large sequencing studies exist, most sequencing data is gathered and processed in much smaller units of hundreds to thousands of samples. These silos of data result in underpowered studies for rare-variant association of complex diseases. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in rare variant studies of complex diseases and traits. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in part due to the high potential for bias caused by differences in sequencing technology and processing. Here we present a new method, Proxy External Controls Association Test (ProxECAT), to integrate sequencing data from different, previously incompatible sources. ProxECAT provides a robust approach to using publicly available sequencing data enabling case-control analysis when no or limited internal controls exist. Further, ProxECAT’s motivating insight, that readily available but often discarded information can be used as a proxy to adjust for differences in data generation, may motivate further method development in other big data technologies and platforms.

Suggested Citation

  • Audrey E Hendricks & Stephen C Billups & Hamish N C Pike & I Sadaf Farooqi & Eleftheria Zeggini & Stephanie A Santorico & Inês Barroso & Josée Dupuis, 2018. "ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls," PLOS Genetics, Public Library of Science, vol. 14(10), pages 1-14, October.
  • Handle: RePEc:plo:pgen00:1007591
    DOI: 10.1371/journal.pgen.1007591
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007591
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007591&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1007591?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ricky Lali & Michael Chong & Arghavan Omidi & Pedrum Mohammadi-Shemirani & Ann Le & Edward Cui & Guillaume Paré, 2021. "Calibrated rare variant genetic risk scores for complex disease prediction using large exome sequence repositories," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    2. Wenan Chen & Shuoguo Wang & Saima Sultana Tithi & David W. Ellison & Daniel J. Schaid & Gang Wu, 2022. "A rare variant analysis framework using public genotype summary counts to prioritize disease-predisposition genes," Nature Communications, Nature, vol. 13(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1007591. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.