IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002604.html
   My bibliography  Save this article

Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

Author

Listed:
  • Jason Flannick
  • Joshua M Korn
  • Pierre Fontanillas
  • George B Grant
  • Eric Banks
  • Mark A Depristo
  • David Altshuler

Abstract

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. Author Summary: In this work we address a series of questions prompted by the rise of next-generation sequencing as a data collection strategy for genetic studies. How does low coverage sequencing compare to traditional microarray based genotyping? Do studies increase sensitivity by collecting both sequencing and array data? What can we learn about technology error modes based on analysis of SNPs for which sequence and array data disagree? To answer these questions, we developed a statistical framework to estimate genotypes from sequence reads, array intensities, and imputation. Through experiments with intensity and read data from the Hapmap and 1000 Genomes (1000 G) Projects, we show that 1 M SNP arrays used for genome wide association studies perform similarly to 1× sequencing. We find that adding low coverage sequence reads to dense array data significantly increases rare variant sensitivity, but adding dense array data to low coverage sequencing has only a small impact. Finally, we describe an improved SNP calling algorithm used in the 1000 G project, inspired by a novel next-generation sequencing error mode identified through analysis of disputed SNPs. These results inform the use of next-generation sequencing in genetic studies and model an approach to further improve genotype calling methods.

Suggested Citation

  • Jason Flannick & Joshua M Korn & Pierre Fontanillas & George B Grant & Eric Banks & Mark A Depristo & David Altshuler, 2012. "Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation," PLOS Computational Biology, Public Library of Science, vol. 8(7), pages 1-13, July.
  • Handle: RePEc:plo:pcbi00:1002604
    DOI: 10.1371/journal.pcbi.1002604
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002604
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002604&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002604?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yongtao Guan & Matthew Stephens, 2008. "Practical Issues in Imputation-Based Association Mapping," PLOS Genetics, Public Library of Science, vol. 4(12), pages 1-11, December.
    2. Heng Li & Richard Durbin, 2011. "Inference of human population history from individual whole-genome sequences," Nature, Nature, vol. 475(7357), pages 493-496, July.
    3. David Reich & Kumarasamy Thangaraj & Nick Patterson & Alkes L. Price & Lalji Singh, 2009. "Reconstructing Indian population history," Nature, Nature, vol. 461(7263), pages 489-494, September.
    4. Sarah B. Ng & Emily H. Turner & Peggy D. Robertson & Steven D. Flygare & Abigail W. Bigham & Choli Lee & Tristan Shaffer & Michelle Wong & Arindam Bhattacharjee & Evan E. Eichler & Michael Bamshad & D, 2009. "Targeted capture and massively parallel sequencing of 12 human exomes," Nature, Nature, vol. 461(7261), pages 272-276, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    2. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    3. Juraj Bergman & Rasmus Ø. Pedersen & Erick J. Lundgren & Rhys T. Lemoine & Sophie Monsarrat & Elena A. Pearce & Mikkel H. Schierup & Jens-Christian Svenning, 2023. "Worldwide Late Pleistocene and Early Holocene population declines in extant megafauna are associated with Homo sapiens expansion rather than climate change," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    4. Michael Bridges & Elizabeth A Heron & Colm O'Dushlaine & Ricardo Segurado & The International Schizophrenia Consortium (ISC) & Derek Morris & Aiden Corvin & Michael Gill & Carlos Pinto, 2011. "Genetic Classification of Populations Using Supervised Learning," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-12, May.
    5. Kay Young McChesney, 2015. "Teaching Diversity," SAGE Open, , vol. 5(4), pages 21582440156, October.
    6. Ya-Mei Ding & Xiao-Xu Pang & Yu Cao & Wei-Ping Zhang & Susanne S. Renner & Da-Yong Zhang & Wei-Ning Bai, 2023. "Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    7. Romain Fournier & Zoi Tsangalidou & David Reich & Pier Francesco Palamara, 2023. "Haplotype-based inference of recent effective population size in modern and ancient DNA samples," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    8. Leonardo Vallini & Carlo Zampieri & Mohamed Javad Shoaee & Eugenio Bortolini & Giulia Marciani & Serena Aneli & Telmo Pievani & Stefano Benazzi & Alberto Barausse & Massimo Mezzavilla & Michael D. Pet, 2024. "The Persian plateau served as hub for Homo sapiens after the main out of Africa dispersal," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    9. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    10. Rozaimi Mohamad Razali & Juan Rodriguez-Flores & Mohammadmersad Ghorbani & Haroon Naeem & Waleed Aamer & Elbay Aliyev & Ali Jubran & Andrew G. Clark & Khalid A. Fakhro & Younes Mokrab, 2021. "Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    11. Mark S Hibbins & Matthew W Hahn, 2021. "The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes," PLOS Genetics, Public Library of Science, vol. 17(11), pages 1-20, November.
    12. David B. Stern & Nathan W. Anderson & Juanita A. Diaz & Carol Eunmi Lee, 2022. "Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    13. S Justin Carlus & Saumya Sarkar & Sandeep Kumar Bansal & Vertika Singh & Kiran Singh & Rajesh Kumar Jha & Nirmala Sadasivam & Sri Revathy Sadasivam & P S Gireesha & Kumarasamy Thangaraj & Singh Rajend, 2016. "Is MTHFR 677 C>T Polymorphism Clinically Important in Polycystic Ovarian Syndrome (PCOS)? A Case-Control Study, Meta-Analysis and Trial Sequential Analysis," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-15, March.
    14. Barton, N.H. & Etheridge, A.M. & Kelleher, J. & Véber, A., 2013. "Inference in two dimensions: Allele frequencies versus lengths of shared sequence blocks," Theoretical Population Biology, Elsevier, vol. 87(C), pages 105-119.
    15. Guangping Huang & Lingyun Song & Xin Du & Xin Huang & Fuwen Wei, 2023. "Evolutionary genomics of camouflage innovation in the orchid mantis," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    16. Legried, Brandon & Terhorst, Jonathan, 2022. "Rates of convergence in the two-island and isolation-with-migration models," Theoretical Population Biology, Elsevier, vol. 147(C), pages 16-27.
    17. Joshua C Randall & Thomas W Winkler & Zoltán Kutalik & Sonja I Berndt & Anne U Jackson & Keri L Monda & Tuomas O Kilpeläinen & Tõnu Esko & Reedik Mägi & Shengxu Li & Tsegaselassie Workalemahu & Mary F, 2013. "Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits," PLOS Genetics, Public Library of Science, vol. 9(6), pages 1-19, June.
    18. Jörn Bethune & April Kleppe & Søren Besenbacher, 2022. "A method to build extended sequence context models of point mutations and indels," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    19. Wilton, Peter R. & Baduel, Pierre & Landon, Matthieu M. & Wakeley, John, 2017. "Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference," Theoretical Population Biology, Elsevier, vol. 115(C), pages 1-12.
    20. Hobolth, Asger & Jensen, Jens Ledet, 2014. "Markovian approximation to the finite loci coalescent with recombination along multiple sequences," Theoretical Population Biology, Elsevier, vol. 98(C), pages 48-58.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002604. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.