IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/0010070.html
   My bibliography  Save this article

Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure

Author

Listed:
  • Noah A Rosenberg
  • Saurabh Mahajan
  • Sohini Ramachandran
  • Chengfeng Zhao
  • Jonathan K Pritchard
  • Marcus W Feldman

Abstract

Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables—sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample—on the “clusteredness” of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.Synopsis: By helping to frame the ways in which human genetic variation is conceptualized, an understanding of the genetic structure of human populations can assist in inferring human evolutionary history, as well as in designing studies that search for disease-susceptibility loci. Previously, it has been observed that when individual genomes are clustered solely by genetic similarity, individuals sort into broad clusters that correspond to large geographic regions. It has also been seen that allele frequencies tend to vary continuously across geographic space. These two perspectives seem to be contradictory, but in this article the authors show that they are indeed compatible.

Suggested Citation

  • Noah A Rosenberg & Saurabh Mahajan & Sohini Ramachandran & Chengfeng Zhao & Jonathan K Pritchard & Marcus W Feldman, 2005. "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure," PLOS Genetics, Public Library of Science, vol. 1(6), pages 1-12, December.
  • Handle: RePEc:plo:pgen00:0010070
    DOI: 10.1371/journal.pgen.0010070
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0010070
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.0010070&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.0010070?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Peristera Paschou & Petros Drineas & Jamey Lewis & Caroline M Nievergelt & Deborah A Nickerson & Joshua D Smith & Paul M Ridker & Daniel I Chasman & Ronald M Krauss & Elad Ziv, 2008. "Tracing Sub-Structure in the European American Population with PCA-Informative Markers," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-13, July.
    2. Vaughan, Laura K. & Divers, Jasmin & Padilla, Miguel A. & Redden, David T. & Tiwari, Hemant K. & Pomp, Daniel & Allison, David B., 2009. "The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1755-1766, March.
    3. Charlotte Faurie & Clement Mettling & Mohamed Ali Bchir & Danang Hadmoko & Carine Heitz & Evi Lestari & Michel Raymond & Marc Willinger, 2016. "Evidence of genotypic adaptation to the exposure to volcanic risk at the dopamine receptor DRD4 locus," Post-Print hal-02062364, HAL.
    4. Liu Xiran & Ahsan Zarif & Martheswaran Tarun K. & Rosenberg Noah A., 2023. "When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself?," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 22(1), pages 1-24, January.
    5. Spilimbergo, Antonio & Giuliano, Paola & Tonon, Giovanni, 2006. "Genetic, Cultural and Geographical Distances," CEPR Discussion Papers 5807, C.E.P.R. Discussion Papers.
    6. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    7. Szpiech, Zachary A. & Rosenberg, Noah A., 2011. "On the size distribution of private microsatellite alleles," Theoretical Population Biology, Elsevier, vol. 80(2), pages 100-113.
    8. Ricardo Kanitz & Elsa G Guillot & Sylvain Antoniazza & Samuel Neuenschwander & Jérôme Goudet, 2018. "Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-16, February.
    9. Frank, Reanne, 2007. "What to make of it? The (Re)emergence of a biological conceptualization of race in health disparities research," Social Science & Medicine, Elsevier, vol. 64(10), pages 1977-1983, May.
    10. Eric R Londin & Margaret A Keller & Cathleen Maista & Gretchen Smith & Laura A Mamounas & Ran Zhang & Steven J Madore & Katrina Gwinn & Roderick A Corriveau, 2010. "CoAIMs: A Cost-Effective Panel of Ancestry Informative Markers for Determining Continental Origins," PLOS ONE, Public Library of Science, vol. 5(10), pages 1-12, October.
    11. Arbisser, Ilana M. & Rosenberg, Noah A., 2020. "FST and the triangle inequality for biallelic markers," Theoretical Population Biology, Elsevier, vol. 133(C), pages 117-129.
    12. Catherine Bliss, 2015. "Science and Struggle," The ANNALS of the American Academy of Political and Social Science, , vol. 661(1), pages 86-108, September.
    13. Ramachandran, Sohini & Rosenberg, Noah A. & Feldman, Marcus W. & Wakeley, John, 2008. "Population differentiation and migration: Coalescence times in a two-sex island model for autosomal and X-linked loci," Theoretical Population Biology, Elsevier, vol. 74(4), pages 291-301.
    14. Ting Fung Ma & Fangfang Wang & Jun Zhu, 2023. "On generalized latent factor modeling and inference for high‐dimensional binomial data," Biometrics, The International Biometric Society, vol. 79(3), pages 2311-2320, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:0010070. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.