IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1002886.html
   My bibliography  Save this article

A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations

Author

Listed:
  • Chaolong Wang
  • Sebastian Zöllner
  • Noah A Rosenberg

Abstract

Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure. Author Summary: The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been used to summarize spatial patterns of genetic variation, typically by placing individuals on a two-dimensional map in such a way that pairwise Euclidean distances between individuals on the map approximately reflect corresponding genetic relationships. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically across different parts of the world. In this study, we combine genome-wide SNP data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions. By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels. Surprisingly, the highest similarity is found in Asia, even though the geographic barrier of the Himalaya Mountains has created a discontinuity on the PCA map of genetic variation.

Suggested Citation

  • Chaolong Wang & Sebastian Zöllner & Noah A Rosenberg, 2012. "A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations," PLOS Genetics, Public Library of Science, vol. 8(8), pages 1-16, August.
  • Handle: RePEc:plo:pgen00:1002886
    DOI: 10.1371/journal.pgen.1002886
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002886
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1002886&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1002886?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    2. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    3. Mattias Jakobsson & Sonja W. Scholz & Paul Scheet & J. Raphael Gibbs & Jenna M. VanLiere & Hon-Chung Fung & Zachary A. Szpiech & James H. Degnan & Kai Wang & Rita Guerreiro & Jose M. Bras & Jennifer C, 2008. "Genotype, haplotype and copy-number variation in worldwide human populations," Nature, Nature, vol. 451(7181), pages 998-1003, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    2. Wirtz, Johannes & Guindon, Stéphane, 2024. "On the connections between the spatial Lambda–Fleming–Viot model and other processes for analysing geo-referenced genetic data," Theoretical Population Biology, Elsevier, vol. 158(C), pages 139-149.
    3. Nur Hani Syazwani Bakri & Nur Aisyah Nabilah Mat Razi & Mohd Firdaus Ahmad & Nur Syazwani Zulaikha Safwan & Nur Dalilah Dahlan & Ummi Kalthum Mokhtar, 2024. "Academic Performance (CGPA) Influences Mental Health: A Study of Students at Seremban Medical Assistant College (SMCA)," Information Management and Business Review, AMH International, vol. 16(2), pages 46-52.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ricardo Kanitz & Elsa G Guillot & Sylvain Antoniazza & Samuel Neuenschwander & Jérôme Goudet, 2018. "Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-16, February.
    2. Wang Chaolong & Szpiech Zachary A & Degnan James H & Jakobsson Mattias & Pemberton Trevor J & Hardy John A & Singleton Andrew B & Rosenberg Noah A, 2010. "Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-22, January.
    3. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    4. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    5. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    6. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    7. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.
    8. Alejandro Ochoa & John D Storey, 2021. "Estimating FST and kinship for arbitrary population structures," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-36, January.
    9. Feldman, Michael J., 2023. "Spiked singular values and vectors under extreme aspect ratios," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    10. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    11. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    12. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    13. Guang Guo & Yilan Fu & Hedwig Lee & Tianji Cai & Kathleen Mullan Harris & Yi Li, 2014. "Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States," Demography, Springer;Population Association of America (PAA), vol. 51(1), pages 141-172, February.
    14. Forien, Raphaël & Ringbauer, Harald & Coop, Graham, 2024. "Demographic inference for spatially heterogeneous populations using long shared haplotypes," Theoretical Population Biology, Elsevier, vol. 159(C), pages 108-124.
    15. Panczak, Radoslaw & Moser, André & Held, Leonhard & Jones, Philip A. & Rühli, Frank J. & Staub, Kaspar, 2017. "A tall order: Small area mapping and modelling of adult height among Swiss male conscripts," Economics & Human Biology, Elsevier, vol. 26(C), pages 61-69.
    16. The International Multiple Sclerosis Genetics Consortium, 2011. "The Genetic Association of Variants in CD6, TNFRSF1A and IRF8 to Multiple Sclerosis: A Multicenter Case-Control Study," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-6, April.
    17. Xiaodong Liu & Ke Zhang & Neslihan A. Kaya & Zhe Jia & Dafei Wu & Tingting Chen & Zhiyuan Liu & Sinan Zhu & Axel M. Hillmer & Torsten Wuestefeld & Jin Liu & Yun Shen Chan & Zheng Hu & Liang Ma & Li Ji, 2024. "Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Marie-Claude Babron & Marie de Tayrac & Douglas N Rutledge & Eleftheria Zeggini & Emmanuelle Génin, 2012. "Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    19. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    20. Keith Humphreys & Alexander Grankvist & Monica Leu & Per Hall & Jianjun Liu & Samuli Ripatti & Karola Rehnström & Leif Groop & Lars Klareskog & Bo Ding & Henrik Grönberg & Jianfeng Xu & Nancy L Peders, 2011. "The Genetic Structure of the Swedish Population," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1002886. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.