IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1009241.html
   My bibliography  Save this article

Estimating FST and kinship for arbitrary population structures

Author

Listed:
  • Alejandro Ochoa
  • John D Storey

Abstract

FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates.Author summary: Kinship coefficients and FST, which measure relatedness and population structure, respectively, are important quantities needed to accurately perform various analyses on genetic data, including genome-wide association studies and heritability estimation. However, existing estimators require restrictive assumptions of independence that are not met by real human and other datasets. In this work we find that existing estimators can be severely biased under reasonable scenarios, first by theoretically determining their properties, and then using an admixture simulation to illustrate our findings. In particular, we find that existing FST estimators are downwardly biased, and that existing kinship matrix estimators have related biases that are on average downward and of similar magnitude but vary for every pair of individuals. These insights led us to a new estimation framework for kinship and FST that is practically unbiased for any population structure, as demonstrated by theory and simulations. Our new approaches—available as open-source R packages—are easy to use and are more widely applicable than existing approaches, and they are likely to improve downstream analyses that require accurate kinship and FST estimates.

Suggested Citation

  • Alejandro Ochoa & John D Storey, 2021. "Estimating FST and kinship for arbitrary population structures," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-36, January.
  • Handle: RePEc:plo:pgen00:1009241
    DOI: 10.1371/journal.pgen.1009241
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009241
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1009241&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1009241?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zheng, Xiuwen & Weir, Bruce S., 2016. "Eigenanalysis of SNP data with an identity by descent interpretation," Theoretical Population Biology, Elsevier, vol. 107(C), pages 65-76.
    2. Joseph K Pickrell & Jonathan K Pritchard, 2012. "Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data," PLOS Genetics, Public Library of Science, vol. 8(11), pages 1-17, November.
    3. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    4. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    5. Edge, Michael D. & Rosenberg, Noah A., 2014. "Upper bounds on FST in terms of the frequency of the most frequent allele and total homozygosity: The case of a specified number of alleles," Theoretical Population Biology, Elsevier, vol. 97(C), pages 20-34.
    6. Stephen Leslie & Bruce Winney & Garrett Hellenthal & Dan Davison & Abdelhamid Boumertit & Tammy Day & Katarzyna Hutnik & Ellen C. Royrvik & Barry Cunliffe & Daniel J. Lawson & Daniel Falush & Colin Fr, 2015. "The fine-scale genetic structure of the British population," Nature, Nature, vol. 519(7543), pages 309-314, March.
    7. George Nicholson & Albert V. Smith & Frosti Jónsson & Ómar Gústafsson & Kári Stefánsson & Peter Donnelly, 2002. "Assessing population differentiation and isolation from single‐nucleotide polymorphism data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 695-715, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guillermo Barturen & Elena Carnero-Montoro & Manuel Martínez-Bueno & Silvia Rojo-Rello & Beatriz Sobrino & Óscar Porras-Perales & Clara Alcántara-Domínguez & David Bernardo & Marta E. Alarcón-Riquelme, 2022. "Whole blood DNA methylation analysis reveals respiratory environmental traits involved in COVID-19 severity following SARS-CoV-2 infection," Nature Communications, Nature, vol. 13(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christian M Hagen & Vanessa F Gonçalves & Paula L Hedley & Jonas Bybjerg-Grauholm & Marie Bækvad-Hansen & Christine S Hansen & Jørgen K Kanters & Jimmi Nielsen & Ole Mors & Alfonso B Demur & Thomas D , 2018. "Schizophrenia-associated mt-DNA SNPs exhibit highly variable haplogroup affiliation and nuclear ancestry: Bi-genomic dependence raises major concerns for link to disease," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-14, December.
    2. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    3. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    4. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    5. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    6. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.
    7. Feldman, Michael J., 2023. "Spiked singular values and vectors under extreme aspect ratios," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    8. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    9. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    10. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    11. Guang Guo & Yilan Fu & Hedwig Lee & Tianji Cai & Kathleen Mullan Harris & Yi Li, 2014. "Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States," Demography, Springer;Population Association of America (PAA), vol. 51(1), pages 141-172, February.
    12. Forien, Raphaël & Ringbauer, Harald & Coop, Graham, 2024. "Demographic inference for spatially heterogeneous populations using long shared haplotypes," Theoretical Population Biology, Elsevier, vol. 159(C), pages 108-124.
    13. Panczak, Radoslaw & Moser, André & Held, Leonhard & Jones, Philip A. & Rühli, Frank J. & Staub, Kaspar, 2017. "A tall order: Small area mapping and modelling of adult height among Swiss male conscripts," Economics & Human Biology, Elsevier, vol. 26(C), pages 61-69.
    14. The International Multiple Sclerosis Genetics Consortium, 2011. "The Genetic Association of Variants in CD6, TNFRSF1A and IRF8 to Multiple Sclerosis: A Multicenter Case-Control Study," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-6, April.
    15. Xiaodong Liu & Ke Zhang & Neslihan A. Kaya & Zhe Jia & Dafei Wu & Tingting Chen & Zhiyuan Liu & Sinan Zhu & Axel M. Hillmer & Torsten Wuestefeld & Jin Liu & Yun Shen Chan & Zheng Hu & Liang Ma & Li Ji, 2024. "Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    16. Marie-Claude Babron & Marie de Tayrac & Douglas N Rutledge & Eleftheria Zeggini & Emmanuelle Génin, 2012. "Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    17. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    18. Keith Humphreys & Alexander Grankvist & Monica Leu & Per Hall & Jianjun Liu & Samuli Ripatti & Karola Rehnström & Leif Groop & Lars Klareskog & Bo Ding & Henrik Grönberg & Jianfeng Xu & Nancy L Peders, 2011. "The Genetic Structure of the Swedish Population," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.
    19. Diana Chang & Feng Gao & Andrea Slavney & Li Ma & Yedael Y Waldman & Aaron J Sams & Paul Billing-Ross & Aviv Madar & Richard Spritz & Alon Keinan, 2014. "Accounting for eXentricities: Analysis of the X Chromosome in GWAS Reveals X-Linked Genes Implicated in Autoimmune Diseases," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-31, December.
    20. Soraggi, Samuele & Wiuf, Carsten, 2019. "General theory for stochastic admixture graphs and F-statistics," Theoretical Population Biology, Elsevier, vol. 125(C), pages 56-66.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1009241. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.