IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1002453.html
   My bibliography  Save this article

Inference of Population Structure using Dense Haplotype Data

Author

Listed:
  • Daniel John Lawson
  • Garrett Hellenthal
  • Simon Myers
  • Daniel Falush

Abstract

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/. Author Summary: The first step in almost every genetic analysis is to establish how sample members are related to each other. High relatedness between individuals can arise if they share a small number of recent ancestors, e.g. if they are distant cousins or a larger number of more distant ones, e.g. if their ancestors come from the same region. The most popular methods for investigating these relationships analyse successive markers independently, simply adding the information they provide. This works well for studies involving hundreds of markers scattered around the genome but is less appropriate now that entire genomes can be sequenced. We describe a “chromosome painting” approach to characterising shared ancestry that takes into account the fact that DNA is transmitted from generation to generation as a linear molecule in chromosomes. We show that the approach increases resolution relative to previous techniques, allowing differences in ancestry profiles among individuals to be resolved at the finest scales yet. We provide mathematical, statistical, and graphical machinery to exploit this new information and to characterize relationships at continental, regional, local, and family scales.

Suggested Citation

  • Daniel John Lawson & Garrett Hellenthal & Simon Myers & Daniel Falush, 2012. "Inference of Population Structure using Dense Haplotype Data," PLOS Genetics, Public Library of Science, vol. 8(1), pages 1-16, January.
  • Handle: RePEc:plo:pgen00:1002453
    DOI: 10.1371/journal.pgen.1002453
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002453
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1002453&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1002453?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    2. David Reich & Kumarasamy Thangaraj & Nick Patterson & Alkes L. Price & Lalji Singh, 2009. "Reconstructing Indian population history," Nature, Nature, vol. 461(7263), pages 489-494, September.
    3. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    4. Gil McVean, 2009. "A Genealogical Interpretation of Principal Components Analysis," PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-10, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    2. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    3. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    4. Mohammad Hossein Olyaee & Alireza Khanteymoori & Khosrow Khalifeh, 2020. "A chaotic viewpoint-based approach to solve haplotype assembly using hypergraph model," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-19, October.
    5. Isabel Alves & Joanna Giemza & Michael G. B. Blum & Carolina Bernhardsson & Stéphanie Chatel & Matilde Karakachoff & Aude Pierre & Anthony F. Herzig & Robert Olaso & Martial Monteil & Véronique Gallie, 2024. "Human genetic structure in Northwest France provides new insights into West European historical demography," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    6. Lokman Galal & Frédéric Ariey & Meriadeg Ar Gouilh & Marie-Laure Dardé & Azra Hamidović & Franck Letourneur & Franck Prugnolle & Aurélien Mercier, 2022. "A unique Toxoplasma gondii haplotype accompanied the global expansion of cats," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    7. Yedael Y Waldman & Arjun Biddanda & Natalie R Davidson & Paul Billing-Ross & Maya Dubrovsky & Christopher L Campbell & Carole Oddoux & Eitan Friedman & Gil Atzmon & Eran Halperin & Harry Ostrer & Alon, 2016. "The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-28, March.
    8. Jerome Kelleher & Alison M Etheridge & Gilean McVean, 2016. "Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-22, May.
    9. Buzbas, Erkan Ozge & Verdu, Paul, 2018. "Inference on admixture fractions in a mechanistic model of recurrent admixture," Theoretical Population Biology, Elsevier, vol. 122(C), pages 149-157.
    10. Matthieu Bouaziz & Caroline Paccard & Mickael Guedj & Christophe Ambroise, 2012. "SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-17, October.
    11. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    12. Andrea Fulgione & Célia Neto & Ahmed F. Elfarargi & Emmanuel Tergemina & Shifa Ansari & Mehmet Göktay & Herculano Dinis & Nina Döring & Pádraic J. Flood & Sofia Rodriguez-Pacheco & Nora Walden & Marcu, 2022. "Parallel reduction in flowering time from de novo mutations enable evolutionary rescue in colonizing lineages," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    13. Elisa Bellucci & Andrea Benazzo & Chunming Xu & Elena Bitocchi & Monica Rodriguez & Saleh Alseekh & Valerio Di Vittori & Tania Gioia & Kerstin Neumann & Gaia Cortinovis & Giulia Frascarelli & Ester Mu, 2023. "Selection and adaptive introgression guided the complex evolutionary history of the European common bean," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    14. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    15. David Peris & Emily J. Ubbelohde & Meihua Christina Kuang & Jacek Kominek & Quinn K. Langdon & Marie Adams & Justin A. Koshalek & Amanda Beth Hulfachor & Dana A. Opulente & David J. Hall & Katie Hyma , 2023. "Macroevolutionary diversity of traits and genomes in the model yeast genus Saccharomyces," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    16. Elena Arciero & Sufyan A. Dogra & Daniel S. Malawsky & Massimo Mezzavilla & Theofanis Tsismentzoglou & Qin Qin Huang & Karen A. Hunt & Dan Mason & Saghira Malik Sharif & David A. Heel & Eamonn Sherida, 2021. "Fine-scale population structure and demographic history of British Pakistanis," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    17. James A Watson & Aimee R Taylor & Elizabeth A Ashley & Arjen Dondorp & Caroline O Buckee & Nicholas J White & Chris C Holmes, 2020. "A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-23, October.
    18. Markus Neuditschko & Mehar S Khatkar & Herman W Raadsma, 2012. "NetView: A High-Definition Network-Visualization Approach to Detect Fine-Scale Population Structures from Genome-Wide Patterns of Variation," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-13, October.
    19. Peña-Malavera Andrea & Bruno Cecilia & Fernandez Elmer & Balzarini Monica, 2014. "Comparison of algorithms to infer genetic population structure from unlinked molecular markers," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(4), pages 391-402, August.
    20. Sini Kerminen & Nicola Cerioli & Darius Pacauskas & Aki S Havulinna & Markus Perola & Pekka Jousilahti & Veikko Salomaa & Mark J Daly & Rupesh Vyas & Samuli Ripatti & Matti Pirinen, 2021. "Changes in the fine-scale genetic structure of Finland through the 20th century," PLOS Genetics, Public Library of Science, vol. 17(3), pages 1-26, March.
    21. Kaisa Thorell & Zilia Y. Muñoz-Ramírez & Difei Wang & Santiago Sandoval-Motta & Rajiv Boscolo Agostini & Silvia Ghirotto & Roberto C. Torres & Daniel Falush & M. Constanza Camargo & Charles S. Rabkin, 2023. "The Helicobacter pylori Genome Project: insights into H. pylori population structure from analysis of a worldwide collection of complete genomes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    22. Alex Diaz-Papkovich & Luke Anderson-Trocmé & Chief Ben-Eghan & Simon Gravel, 2019. "UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts," PLOS Genetics, Public Library of Science, vol. 15(11), pages 1-24, November.
    23. Buschbom, Jutta, 2018. "Exploring and validating statistical reliability in forensic conservation genetics," Thünen Reports 63, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries.
    24. Melisa Olave & Alexander Nater & Andreas F. Kautt & Axel Meyer, 2022. "Early stages of sympatric homoploid hybrid speciation in crater lake cichlid fishes," Nature Communications, Nature, vol. 13(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    2. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    3. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    4. Wang Chaolong & Szpiech Zachary A & Degnan James H & Jakobsson Mattias & Pemberton Trevor J & Hardy John A & Singleton Andrew B & Rosenberg Noah A, 2010. "Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-22, January.
    5. Duforet-Frebourg, Nicolas & Slatkin, Montgomery, 2016. "Isolation-by-distance-and-time in a stepping-stone model," Theoretical Population Biology, Elsevier, vol. 108(C), pages 24-35.
    6. Alexander Dilthey & Stephen Leslie & Loukas Moutsianas & Judong Shen & Charles Cox & Matthew R Nelson & Gil McVean, 2013. "Multi-Population Classical HLA Type Imputation," PLOS Computational Biology, Public Library of Science, vol. 9(2), pages 1-13, February.
    7. Hugh G Gauch Jr. & Sheng Qian & Hans-Peter Piepho & Linda Zhou & Rui Chen, 2019. "Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-26, June.
    8. Zheng, Xiuwen & Weir, Bruce S., 2016. "Eigenanalysis of SNP data with an identity by descent interpretation," Theoretical Population Biology, Elsevier, vol. 107(C), pages 65-76.
    9. Jason Sawler & Bruce Reisch & Mallikarjuna K Aradhya & Bernard Prins & Gan-Yuan Zhong & Heidi Schwaninger & Charles Simon & Edward Buckler & Sean Myles, 2013. "Genomics Assisted Ancestry Deconvolution in Grape," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-1, November.
    10. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    11. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    12. Filippini, Massimo & Wekhof, Tobias, 2021. "The effect of culture on energy efficient vehicle ownership," Journal of Environmental Economics and Management, Elsevier, vol. 105(C).
    13. Andrey V Khrunin & Denis V Khokhrin & Irina N Filippova & Tõnu Esko & Mari Nelis & Natalia A Bebyakova & Natalia L Bolotova & Janis Klovins & Liene Nikitina-Zake & Karola Rehnström & Samuli Ripatti & , 2013. "A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-9, March.
    14. Diana Dunca & Sandesh Chopade & María Gordillo-Marañón & Aroon D. Hingorani & Karoline Kuchenbaecker & Chris Finan & Amand F. Schmidt, 2024. "Comparing the effects of CETP in East Asian and European ancestries: a Mendelian randomization study," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    15. Wenhan Chen & Yang Wu & Zhili Zheng & Ting Qi & Peter M. Visscher & Zhihong Zhu & Jian Yang, 2021. "Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    16. Pierre Luisi & Angelina García & Juan Manuel Berros & Josefina M B Motti & Darío A Demarchi & Emma Alfaro & Eliana Aquilano & Carina Argüelles & Sergio Avena & Graciela Bailliet & Julieta Beltramo & C, 2020. "Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-30, July.
    17. Brielin C Brown & Nicolas L Bray & Lior Pachter, 2018. "Expression reflects population structure," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-15, December.
    18. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    19. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    20. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1002453. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.