IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0045685.html
   My bibliography  Save this article

SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

Author

Listed:
  • Matthieu Bouaziz
  • Caroline Paccard
  • Mickael Guedj
  • Christophe Ambroise

Abstract

Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns.

Suggested Citation

  • Matthieu Bouaziz & Caroline Paccard & Mickael Guedj & Christophe Ambroise, 2012. "SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-17, October.
  • Handle: RePEc:plo:pone00:0045685
    DOI: 10.1371/journal.pone.0045685
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0045685
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0045685&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0045685?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    2. Daniel John Lawson & Garrett Hellenthal & Simon Myers & Daniel Falush, 2012. "Inference of Population Structure using Dense Haplotype Data," PLOS Genetics, Public Library of Science, vol. 8(1), pages 1-16, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Waters, Edward K. & Sidhu, Harvinder S. & Sidhu, Leesa A. & Mercer, Geoffry N., 2015. "Extended Lotka–Volterra equations incorporating population heterogeneity: Derivation and analysis of the predator–prey case," Ecological Modelling, Elsevier, vol. 297(C), pages 187-195.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    2. Peña-Malavera Andrea & Bruno Cecilia & Fernandez Elmer & Balzarini Monica, 2014. "Comparison of algorithms to infer genetic population structure from unlinked molecular markers," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(4), pages 391-402, August.
    3. Markus Neuditschko & Mehar S Khatkar & Herman W Raadsma, 2012. "NetView: A High-Definition Network-Visualization Approach to Detect Fine-Scale Population Structures from Genome-Wide Patterns of Variation," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-13, October.
    4. Yedael Y Waldman & Arjun Biddanda & Natalie R Davidson & Paul Billing-Ross & Maya Dubrovsky & Christopher L Campbell & Carole Oddoux & Eitan Friedman & Gil Atzmon & Eran Halperin & Harry Ostrer & Alon, 2016. "The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-28, March.
    5. Buschbom, Jutta, 2018. "Exploring and validating statistical reliability in forensic conservation genetics," Thünen Reports 63, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries.
    6. Isabel Alves & Joanna Giemza & Michael G. B. Blum & Carolina Bernhardsson & Stéphanie Chatel & Matilde Karakachoff & Aude Pierre & Anthony F. Herzig & Robert Olaso & Martial Monteil & Véronique Gallie, 2024. "Human genetic structure in Northwest France provides new insights into West European historical demography," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    7. Elena Arciero & Sufyan A. Dogra & Daniel S. Malawsky & Massimo Mezzavilla & Theofanis Tsismentzoglou & Qin Qin Huang & Karen A. Hunt & Dan Mason & Saghira Malik Sharif & David A. Heel & Eamonn Sherida, 2021. "Fine-scale population structure and demographic history of British Pakistanis," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    8. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    9. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    10. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    11. Felsenstein, Joseph, 2015. "Covariation of gene frequencies in a stepping-stone lattice of populations," Theoretical Population Biology, Elsevier, vol. 100(C), pages 88-97.
    12. Yaron Granot & Omri Tal & Saharon Rosset & Karl Skorecki, 2016. "On the Apportionment of Population Structure," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-24, August.
    13. Özkan İş & Xue Wang & Joseph S. Reddy & Yuhao Min & Elanur Yilmaz & Prabesh Bhattarai & Tulsi Patel & Jeremiah Bergman & Zachary Quicksall & Michael G. Heckman & Frederick Q. Tutor-New & Birsen Can De, 2024. "Gliovascular transcriptional perturbations in Alzheimer’s disease reveal molecular mechanisms of blood brain barrier dysfunction," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    14. Hyosik Jang & Ian M Ehrenreich, 2012. "Genome-Wide Characterization of Genetic Variation in the Unicellular, Green Alga Chlamydomonas reinhardtii," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-9, July.
    15. Mathieu Gautier & Denis Laloë & Katayoun Moazami-Goudarzi, 2010. "Insights into the Genetic History of French Cattle from Dense SNP Data on 47 Worldwide Breeds," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-11, September.
    16. Xiaofeng Cai & Xuepeng Sun & Chenxi Xu & Honghe Sun & Xiaoli Wang & Chenhui Ge & Zhonghua Zhang & Quanxi Wang & Zhangjun Fei & Chen Jiao & Quanhua Wang, 2021. "Genomic analyses provide insights into spinach domestication and the genetic basis of agronomic traits," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    17. Lee, Anthony J. & Hibbs, Courtney & Wright, Margaret J. & Martin, Nicholas G. & Keller, Matthew C. & Zietsch, Brendan P., 2017. "Assessing the accuracy of perceptions of intelligence based on heritable facial features," Intelligence, Elsevier, vol. 64(C), pages 1-8.
    18. Thompson Katherine L. & Linnen Catherine R. & Kubatko Laura, 2016. "Tree-based quantitative trait mapping in the presence of external covariates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(6), pages 473-490, December.
    19. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    20. Jacobo Pardo-Seco & Alberto Gómez-Carballa & Jorge Amigo & Federico Martinón-Torres & Antonio Salas, 2014. "A Genome-Wide Study of Modern-Day Tuscans: Revisiting Herodotus's Theory on the Origin of the Etruscans," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-11, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0045685. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.