IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1009241.html
   My bibliography  Save this article

Estimating FST and kinship for arbitrary population structures

Author

Listed:
  • Alejandro Ochoa
  • John D Storey

Abstract

FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates.Author summary: Kinship coefficients and FST, which measure relatedness and population structure, respectively, are important quantities needed to accurately perform various analyses on genetic data, including genome-wide association studies and heritability estimation. However, existing estimators require restrictive assumptions of independence that are not met by real human and other datasets. In this work we find that existing estimators can be severely biased under reasonable scenarios, first by theoretically determining their properties, and then using an admixture simulation to illustrate our findings. In particular, we find that existing FST estimators are downwardly biased, and that existing kinship matrix estimators have related biases that are on average downward and of similar magnitude but vary for every pair of individuals. These insights led us to a new estimation framework for kinship and FST that is practically unbiased for any population structure, as demonstrated by theory and simulations. Our new approaches—available as open-source R packages—are easy to use and are more widely applicable than existing approaches, and they are likely to improve downstream analyses that require accurate kinship and FST estimates.

Suggested Citation

  • Alejandro Ochoa & John D Storey, 2021. "Estimating FST and kinship for arbitrary population structures," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-36, January.
  • Handle: RePEc:plo:pgen00:1009241
    DOI: 10.1371/journal.pgen.1009241
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009241
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1009241&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1009241?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zheng, Xiuwen & Weir, Bruce S., 2016. "Eigenanalysis of SNP data with an identity by descent interpretation," Theoretical Population Biology, Elsevier, vol. 107(C), pages 65-76.
    2. Joseph K Pickrell & Jonathan K Pritchard, 2012. "Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data," PLOS Genetics, Public Library of Science, vol. 8(11), pages 1-17, November.
    3. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    4. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    5. Edge, Michael D. & Rosenberg, Noah A., 2014. "Upper bounds on FST in terms of the frequency of the most frequent allele and total homozygosity: The case of a specified number of alleles," Theoretical Population Biology, Elsevier, vol. 97(C), pages 20-34.
    6. Stephen Leslie & Bruce Winney & Garrett Hellenthal & Dan Davison & Abdelhamid Boumertit & Tammy Day & Katarzyna Hutnik & Ellen C. Royrvik & Barry Cunliffe & Daniel J. Lawson & Daniel Falush & Colin Fr, 2015. "The fine-scale genetic structure of the British population," Nature, Nature, vol. 519(7543), pages 309-314, March.
    7. George Nicholson & Albert V. Smith & Frosti Jónsson & Ómar Gústafsson & Kári Stefánsson & Peter Donnelly, 2002. "Assessing population differentiation and isolation from single‐nucleotide polymorphism data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 695-715, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guillermo Barturen & Elena Carnero-Montoro & Manuel Martínez-Bueno & Silvia Rojo-Rello & Beatriz Sobrino & Óscar Porras-Perales & Clara Alcántara-Domínguez & David Bernardo & Marta E. Alarcón-Riquelme, 2022. "Whole blood DNA methylation analysis reveals respiratory environmental traits involved in COVID-19 severity following SARS-CoV-2 infection," Nature Communications, Nature, vol. 13(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christian M Hagen & Vanessa F Gonçalves & Paula L Hedley & Jonas Bybjerg-Grauholm & Marie Bækvad-Hansen & Christine S Hansen & Jørgen K Kanters & Jimmi Nielsen & Ole Mors & Alfonso B Demur & Thomas D , 2018. "Schizophrenia-associated mt-DNA SNPs exhibit highly variable haplogroup affiliation and nuclear ancestry: Bi-genomic dependence raises major concerns for link to disease," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-14, December.
    2. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    3. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    4. Filippini, Massimo & Wekhof, Tobias, 2021. "The effect of culture on energy efficient vehicle ownership," Journal of Environmental Economics and Management, Elsevier, vol. 105(C).
    5. Andrey V Khrunin & Denis V Khokhrin & Irina N Filippova & Tõnu Esko & Mari Nelis & Natalia A Bebyakova & Natalia L Bolotova & Janis Klovins & Liene Nikitina-Zake & Karola Rehnström & Samuli Ripatti & , 2013. "A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-9, March.
    6. Diana Dunca & Sandesh Chopade & María Gordillo-Marañón & Aroon D. Hingorani & Karoline Kuchenbaecker & Chris Finan & Amand F. Schmidt, 2024. "Comparing the effects of CETP in East Asian and European ancestries: a Mendelian randomization study," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    7. Wenhan Chen & Yang Wu & Zhili Zheng & Ting Qi & Peter M. Visscher & Zhihong Zhu & Jian Yang, 2021. "Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    8. Pierre Luisi & Angelina García & Juan Manuel Berros & Josefina M B Motti & Darío A Demarchi & Emma Alfaro & Eliana Aquilano & Carina Argüelles & Sergio Avena & Graciela Bailliet & Julieta Beltramo & C, 2020. "Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-30, July.
    9. Brielin C Brown & Nicolas L Bray & Lior Pachter, 2018. "Expression reflects population structure," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-15, December.
    10. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    11. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    12. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.
    13. Victor Ronda & Esben Agerbo & Dorthe Bleses & Preben Bo Mortensen & Anders Børglum & Ole Mors & Michael Rosholm & David M. Hougaard & Merete Nordentoft & Thomas Werge, 2022. "Family disadvantage, gender, and the returns to genetic human capital," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(2), pages 550-578, April.
    14. Feldman, Michael J., 2023. "Spiked singular values and vectors under extreme aspect ratios," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    15. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    16. Hobolth, Asger & Siren, Jukka, 2016. "The multivariate Wright–Fisher process with mutation: Moment-based analysis and inference using a hierarchical Beta model," Theoretical Population Biology, Elsevier, vol. 108(C), pages 36-50.
    17. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    18. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    19. Athias, Laure & Wicht, Pascal, 2014. "Cultural Biases in Public Service Delivery: Evidence from a Regression Discontinuity Approach," MPRA Paper 60639, University Library of Munich, Germany.
    20. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1009241. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.