IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v13y2014i4p12n1.html
   My bibliography  Save this article

Comparison of algorithms to infer genetic population structure from unlinked molecular markers

Author

Listed:
  • Peña-Malavera Andrea

    (Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina)

  • Bruno Cecilia

    (Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina)

  • Fernandez Elmer

    (Facultad de Ingeniería, Universidad Católica de Córdoba and CONICET, Camino Alta Gracia Km 10, Cordoba, Argentina)

  • Balzarini Monica

    (Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina)

Abstract

Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.

Suggested Citation

  • Peña-Malavera Andrea & Bruno Cecilia & Fernandez Elmer & Balzarini Monica, 2014. "Comparison of algorithms to infer genetic population structure from unlinked molecular markers," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(4), pages 391-402, August.
  • Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:12:n:1
    DOI: 10.1515/sagmb-2013-0006
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2013-0006
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2013-0006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    2. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    3. Gil McVean, 2009. "A Genealogical Interpretation of Principal Components Analysis," PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-10, October.
    4. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    5. Daniel John Lawson & Garrett Hellenthal & Simon Myers & Daniel Falush, 2012. "Inference of Population Structure using Dense Haplotype Data," PLOS Genetics, Public Library of Science, vol. 8(1), pages 1-16, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yedael Y Waldman & Arjun Biddanda & Natalie R Davidson & Paul Billing-Ross & Maya Dubrovsky & Christopher L Campbell & Carole Oddoux & Eitan Friedman & Gil Atzmon & Eran Halperin & Harry Ostrer & Alon, 2016. "The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-28, March.
    2. Buschbom, Jutta, 2018. "Exploring and validating statistical reliability in forensic conservation genetics," Thünen Reports 63, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries.
    3. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    4. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    5. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    6. Cyril Atkinson-Clement & Eléonore Pigalle, 2021. "What can we learn from Covid-19 pandemic’s impact on human behaviour? The case of France’s lockdown," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-12, December.
    7. Matthieu Bouaziz & Caroline Paccard & Mickael Guedj & Christophe Ambroise, 2012. "SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-17, October.
    8. Öttl, Gerald & Böck, Philipp & Werpup, Nadja & Schwarze, Malte, 2013. "Derivation of representative air traffic peaks as standard input for airport related simulation," Journal of Air Transport Management, Elsevier, vol. 28(C), pages 31-39.
    9. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    10. Henner Gimpel & Daniel Rau & Maximilian Röglinger, 2018. "Understanding FinTech start-ups – a taxonomy of consumer-oriented service offerings," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 245-264, August.
    11. Kojadinovic, Ivan, 2010. "Hierarchical clustering of continuous variables based on the empirical copula process and permutation linkages," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 90-108, January.
    12. Zhiguang Huo & Li Zhu & Tianzhou Ma & Hongcheng Liu & Song Han & Daiqing Liao & Jinying Zhao & George Tseng, 2020. "Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(1), pages 1-22, April.
    13. Yi Peng & Yong Zhang & Gang Kou & Yong Shi, 2012. "A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-9, July.
    14. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    15. Rosephine G. Rakotonirainy & Jan H. Vuuren, 2021. "The effect of benchmark data characteristics during empirical strip packing heuristic performance evaluation," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 43(2), pages 467-495, June.
    16. Satre-Meloy, Aven & Diakonova, Marina & Grünewald, Philipp, 2020. "Cluster analysis and prediction of residential peak demand profiles using occupant activity data," Applied Energy, Elsevier, vol. 260(C).
    17. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    18. Z. Volkovich & Z. Barzily & G.-W. Weber & D. Toledano-Kitai & R. Avros, 2012. "An application of the minimal spanning tree approach to the cluster stability problem," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 20(1), pages 119-139, March.
    19. Julian Rossbroich & Jeffrey Durieux & Tom F. Wilderjans, 2022. "Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 264-301, July.
    20. Tan, Kean Ming & Witten, Daniela & Shojaie, Ali, 2015. "The cluster graphical lasso for improved estimation of Gaussian graphical models," Computational Statistics & Data Analysis, Elsevier, vol. 85(C), pages 23-36.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:12:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.