IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1006811.html
   My bibliography  Save this article

Optimal sequencing strategies for identifying disease-associated singletons

Author

Listed:
  • Sara Rashkin
  • Goo Jun
  • Sai Chen
  • Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO)
  • Goncalo R Abecasis

Abstract

With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per individual, but may limit sample size. Low pass sequencing will miss some variants in each individual but has been shown to provide a cost-effective alternative for studies of common variants. Here, we investigate the impact of sequencing depth on studies of rare variants, focusing on singletons—the variants that are sampled in a single individual and are hardest to detect at low sequencing depths. We first estimate the sensitivity to detect singleton variants in both simulated data and in down-sampled deep genome and exome sequence data. We then explore the power of association studies comparing burden of singleton variants in cases and controls under a variety of conditions. We show that the power to detect singletons increases with coverage, typically plateauing for coverage > ~25x. Next, we show that, when total sequencing capacity is fixed, the power of association studies focused on singletons is typically maximized for coverage of 15-20x, independent of relative risk, disease prevalence, singleton burden, and case-control ratio. Our results suggest sequencing depth of 15-20x as an appropriate compromise of singleton detection power and sample size for studies of rare variants in complex disease.Author summary: Genetic studies of rare variants can help us understand the biology of human disease. With modern techniques and sufficient effort, it is possible to very accurately resolve any human genome, identifying most of its unique features. When funding is limited, applying these techniques to study human disease often involves a trade-off between examining more samples, at reduced accuracy per sample, or fewer samples, each at greater accuracy. We evaluate these trade-offs for studies of very rare variants, using both simulation and real data. We propose cost effective strategies for increasing our understanding of human disease.

Suggested Citation

  • Sara Rashkin & Goo Jun & Sai Chen & Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) & Goncalo R Abecasis, 2017. "Optimal sequencing strategies for identifying disease-associated singletons," PLOS Genetics, Public Library of Science, vol. 13(6), pages 1-16, June.
  • Handle: RePEc:plo:pgen00:1006811
    DOI: 10.1371/journal.pgen.1006811
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006811
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1006811&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1006811?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1006811. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.