IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004139.html
   My bibliography  Save this article

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

Author

Listed:
  • Oren E Livne
  • Lide Han
  • Gorka Alkorta-Aranburu
  • William Wentworth-Sheilds
  • Mark Abney
  • Carole Ober
  • Dan L Nicolae

Abstract

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.Author Summary: The recent availability of whole genome and whole exome sequencing allows genetic studies of human diseases and traits at an unprecedented resolution, although their cost limits the size of the studied sample. To overcome this limitation and design cost-efficient studies, we developed a two step method: sequencing of relatively few members of a well-characterized founder population followed by pedigree-based whole genome imputation of many other individuals with genome-wide genotype data. We show that by sequencing only 98 Hutterites, we can impute 7 million variants in an additional 1,317 Hutterites with >99% accuracy and an average call rate of 87%. Furthermore, parental origin was assigned to 83% of the alleles. Such studies in the Hutterites and other founder populations should yield new insights into the genetic architecture of common diseases, gene expression traits, and clinically relevant biomarkers of disease, and ultimately provide outstanding opportunities for personalized medicine in these well-characterized populations.

Suggested Citation

  • Oren E Livne & Lide Han & Gorka Alkorta-Aranburu & William Wentworth-Sheilds & Mark Abney & Carole Ober & Dan L Nicolae, 2015. "PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population," PLOS Computational Biology, Public Library of Science, vol. 11(3), pages 1-14, March.
  • Handle: RePEc:plo:pcbi00:1004139
    DOI: 10.1371/journal.pcbi.1004139
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004139
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004139&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004139?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bryan N Howie & Peter Donnelly & Jonathan Marchini, 2009. "A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies," PLOS Genetics, Public Library of Science, vol. 5(6), pages 1-15, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Amy Ko & Rasmus Nielsen, 2017. "Composite likelihood method for inferring local pedigrees," PLOS Genetics, Public Library of Science, vol. 13(8), pages 1-21, August.
    2. Mark Reppell & John Novembre, 2018. "Using pseudoalignment and base quality to accurately quantify microbial community composition," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-23, April.
    3. Esther Ulitzsch & Qiwei He & Vincent Ulitzsch & Hendrik Molter & André Nichterlein & Rolf Niedermeier & Steffi Pohl, 2021. "Combining Clickstream Analyses and Graph-Modeled Data Clustering for Identifying Common Response Processes," Psychometrika, Springer;The Psychometric Society, vol. 86(1), pages 190-214, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    2. Chuan Gao & Nan Wang & Xiuqing Guo & Julie T Ziegler & Kent D Taylor & Anny H Xiang & Yang Hai & Steven J Kridel & Jerry L Nadler & Fouad Kandeel & Leslie J Raffel & Yii-Der I Chen & Jill M Norris & J, 2015. "A Comprehensive Analysis of Common and Rare Variants to Identify Adiposity Loci in Hispanic Americans: The IRAS Family Study (IRASFS)," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-17, November.
    3. Paul S de Vries & Maria Sabater-Lleal & Daniel I Chasman & Stella Trompet & Tarunveer S Ahluwalia & Alexander Teumer & Marcus E Kleber & Ming-Huei Chen & Jie Jin Wang & John R Attia & Riccardo E Mario, 2017. "Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-22, January.
    4. Bo Jiang & Jun S. Liu, 2015. "Bayesian Partition Models for Identifying Expression Quantitative Trait Loci," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1350-1361, December.
    5. Rakesh Chettier & Lesa Nelson & James W Ogilvie & Hans M Albertsen & Kenneth Ward, 2015. "Haplotypes at LBX1 Have Distinct Inheritance Patterns with Opposite Effects in Adolescent Idiopathic Scoliosis," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-11, February.
    6. Michel S. Naslavsky & Marilia O. Scliar & Guilherme L. Yamamoto & Jaqueline Yu Ting Wang & Stepanka Zverinova & Tatiana Karp & Kelly Nunes & José Ricardo Magliocco Ceroni & Diego Lima Carvalho & Carlo, 2022. "Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    8. Anshuman Sewda & A J Agopian & Elizabeth Goldmuntz & Hakon Hakonarson & Bernice E Morrow & Fadi Musfee & Deanne Taylor & Laura E Mitchell & on behalf of the Pediatric Cardiac Genomics Consortium, 2020. "Gene-based analyses of the maternal genome implicate maternal effect genes as risk factors for conotruncal heart defects," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-15, June.
    9. Lin Yuan & Chang-An Yuan & De-Shuang Huang, 2017. "FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis," Complexity, Hindawi, vol. 2017, pages 1-10, September.
    10. Carl Nettelblad, 2013. "Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-5, March.
    11. Viinikainen, Jutta & Bryson, Alex & Böckerman, Petri & Kari, Jaana T. & Lehtimäki, Terho & Raitakari, Olli & Viikari, Jorma & Pehkonen, Jaakko, 2022. "Does better education mitigate risky health behavior? A mendelian randomization study," Economics & Human Biology, Elsevier, vol. 46(C).
    12. Cavin K Ward-Caviness & Paul S de Vries & Kerri L Wiggins & Jennifer E Huffman & Lisa R Yanek & Lawrence F Bielak & Franco Giulianini & Xiuqing Guo & Marcus E Kleber & Tim Kacprowski & Stefan Groß & A, 2019. "Mendelian randomization evaluation of causal effects of fibrinogen on incident coronary heart disease," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-18, May.
    13. Ani Manichaikul & Xin-Qun Wang & Solomon K Musani & David M Herrington & Wendy S Post & James G Wilson & Stephen S Rich & Annabelle Rodriguez, 2015. "Association of the Lipoprotein Receptor SCARB1 Common Missense Variant rs4238001 with Incident Coronary Heart Disease," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-16, May.
    14. Morten Dybdahl Krebs & Gonçalo Espregueira Themudo & Michael Eriksen Benros & Ole Mors & Anders D. Børglum & David Hougaard & Preben Bo Mortensen & Merete Nordentoft & Michael J. Gandal & Chun Chieh F, 2021. "Associations between patterns in comorbid diagnostic trajectories of individuals with schizophrenia and etiological factors," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    15. Heejung Shim & Daniel I Chasman & Joshua D Smith & Samia Mora & Paul M Ridker & Deborah A Nickerson & Ronald M Krauss & Matthew Stephens, 2015. "A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-20, April.
    16. Mette K Andersen & Emil Jørsboe & Line Skotte & Kristian Hanghøj & Camilla H Sandholt & Ida Moltke & Niels Grarup & Timo Kern & Yuvaraj Mahendran & Bolette Søborg & Peter Bjerregaard & Christina V L L, 2020. "The derived allele of a novel intergenic variant at chromosome 11 associates with lower body mass index and a favorable metabolic phenotype in Greenlanders," PLOS Genetics, Public Library of Science, vol. 16(1), pages 1-17, January.
    17. Gianmarco Mignogna & Caitlin E. Carey & Robbee Wedow & Nikolas Baya & Mattia Cordioli & Nicola Pirastu & Rino Bellocco & Kathryn Fiuza Malerbi & Michel G. Nivard & Benjamin M. Neale & Raymond K. Walte, 2023. "Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci," Nature Human Behaviour, Nature, vol. 7(8), pages 1371-1387, August.
    18. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    19. Hans M Albertsen & Rakesh Chettier & Pamela Farrington & Kenneth Ward, 2013. "Genome-Wide Association Study Link Novel Loci to Endometriosis," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-8, March.
    20. Gemma Cadby & Corey Giles & Phillip E. Melton & Kevin Huynh & Natalie A. Mellett & Thy Duong & Anh Nguyen & Michelle Cinel & Alex Smith & Gavriel Olshansky & Tingting Wang & Marta Brozynska & Mike Ino, 2022. "Comprehensive genetic analysis of the human lipidome identifies loci associated with lipid homeostasis with links to coronary artery disease," Nature Communications, Nature, vol. 13(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004139. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.