IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008619.html
   My bibliography  Save this article

Accounting for long-range correlations in genome-wide simulations of large cohorts

Author

Listed:
  • Dominic Nelson
  • Jerome Kelleher
  • Aaron P Ragsdale
  • Claudia Moreau
  • Gil McVean
  • Simon Gravel

Abstract

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.Author summary: Coalescent theory has provided deep theoretical insight into patterns of human diversity. Implementations of coalescent models in simulation software such as ms have further provided tools to interpret thousands of genomic studies. Recent technical progress has allowed for a dramatic increase in the scale at which genomes can be both measured and simulated, opening up opportunities for a finer understanding of evolutionary biology. However, we show that coalescent simulations of long regions of the genome exhibit large biases in sample relatedness, distorting haplotype sharing and ancestry patterns in simulated cohorts. We trace these biases to basic assumptions of the coalescent model, and show how the assumptions can be relaxed to provide a better description of the observed patterns of genetic polymorphism at a fraction of the computational cost.

Suggested Citation

  • Dominic Nelson & Jerome Kelleher & Aaron P Ragsdale & Claudia Moreau & Gil McVean & Simon Gravel, 2020. "Accounting for long-range correlations in genome-wide simulations of large cohorts," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-12, May.
  • Handle: RePEc:plo:pgen00:1008619
    DOI: 10.1371/journal.pgen.1008619
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008619
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008619&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008619?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wilton, Peter R. & Baduel, Pierre & Landon, Matthieu M. & Wakeley, John, 2017. "Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference," Theoretical Population Biology, Elsevier, vol. 115(C), pages 1-12.
    2. Ryan N Gutenkunst & Ryan D Hernandez & Scott H Williamson & Carlos D Bustamante, 2009. "Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data," PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-11, October.
    3. Jerome Kelleher & Alison M Etheridge & Gilean McVean, 2016. "Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-22, May.
    4. Jerome Kelleher & Kevin R Thornton & Jaime Ashander & Peter L Ralph, 2018. "Efficient pedigree recording for fast population genetics simulation," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-21, November.
    5. Benjamin F Voight & Sridhar Kudaravalli & Xiaoquan Wen & Jonathan K Pritchard, 2006. "A Map of Recent Positive Selection in the Human Genome," PLOS Biology, Public Library of Science, vol. 4(3), pages 1-1, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Felix M Key & Benjamin Peter & Megan Y Dennis & Emilia Huerta-Sánchez & Wei Tang & Ludmila Prokunina-Olsson & Rasmus Nielsen & Aida M Andrés, 2014. "Selection on a Variant Associated with Improved Viral Clearance Drives Local, Adaptive Pseudogenization of Interferon Lambda 4 (IFNL4)," PLOS Genetics, Public Library of Science, vol. 10(10), pages 1-12, October.
    2. Melissa J Hubisz & Amy L Williams & Adam Siepel, 2020. "Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph," PLOS Genetics, Public Library of Science, vol. 16(8), pages 1-24, August.
    3. Ralph, Peter L., 2019. "An empirical approach to demographic inference with genomic data," Theoretical Population Biology, Elsevier, vol. 127(C), pages 91-101.
    4. Ali Mahmoudi & Jere Koskela & Jerome Kelleher & Yao-ban Chan & David Balding, 2022. "Bayesian inference of ancestral recombination graphs," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-15, March.
    5. Raul Torres & Zachary A Szpiech & Ryan D Hernandez, 2018. "Human demographic history has amplified the effects of background selection across the genome," PLOS Genetics, Public Library of Science, vol. 14(6), pages 1-27, June.
    6. Roy Ronen & Glenn Tesler & Ali Akbari & Shay Zakov & Noah A Rosenberg & Vineet Bafna, 2015. "Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele," PLOS Genetics, Public Library of Science, vol. 11(9), pages 1-27, September.
    7. Nicholas A Johnson & Marc A Coram & Mark D Shriver & Isabelle Romieu & Gregory S Barsh & Stephanie J London & Hua Tang, 2011. "Ancestral Components of Admixed Genomes in a Mexican Cohort," PLOS Genetics, Public Library of Science, vol. 7(12), pages 1-12, December.
    8. Kirk E Lohmueller & Anders Albrechtsen & Yingrui Li & Su Yeon Kim & Thorfinn Korneliussen & Nicolas Vinckenbosch & Geng Tian & Emilia Huerta-Sanchez & Alison F Feder & Niels Grarup & Torben Jørgensen , 2011. "Natural Selection Affects Multiple Aspects of Genetic Variation at Putatively Neutral Sites across the Human Genome," PLOS Genetics, Public Library of Science, vol. 7(10), pages 1-15, October.
    9. Sol Katzman & Andrew D Kern & Katherine S Pollard & Sofie R Salama & David Haussler, 2010. "GC-Biased Evolution Near Human Accelerated Regions," PLOS Genetics, Public Library of Science, vol. 6(5), pages 1-13, May.
    10. Aurélien Tellier & Peter Pfaffelhuber & Bernhard Haubold & Lisha Naduvilezhath & Laura E Rose & Thomas Städler & Wolfgang Stephan & Dirk Metzler, 2011. "Estimating Parameters of Speciation Models Based on Refined Summaries of the Joint Site-Frequency Spectrum," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-13, May.
    11. Sergio F. Nigenda-Morales & Meixi Lin & Paulina G. Nuñez-Valencia & Christopher C. Kyriazis & Annabel C. Beichman & Jacqueline A. Robinson & Aaron P. Ragsdale & Jorge Urbán R. & Frederick I. Archer & , 2023. "The genomic footprint of whaling and isolation in fin whale populations," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    12. Zihao Wang & Wenxi Wang & Xiaoming Xie & Yongfa Wang & Zhengzhao Yang & Huiru Peng & Mingming Xin & Yingyin Yao & Zhaorong Hu & Jie Liu & Zhenqi Su & Chaojie Xie & Baoyun Li & Zhongfu Ni & Qixin Sun &, 2022. "Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    13. Vasili Pankratov & Milyausha Yunusbaeva & Sergei Ryakhovsky & Maksym Zarodniuk & Bayazit Yunusbayev, 2022. "Prioritizing autoimmunity risk variants for functional analyses by fine-mapping mutations under natural selection," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    14. Sijie Wu & Manfei Zhang & Xinzhou Yang & Fuduan Peng & Juan Zhang & Jingze Tan & Yajun Yang & Lina Wang & Yanan Hu & Qianqian Peng & Jinxi Li & Yu Liu & Yaqun Guan & Chen Chen & Merel A Hamer & Tamar , 2018. "Genome-wide association studies and CRISPR/Cas9-mediated gene editing identify regulatory variants influencing eyebrow thickness in humans," PLOS Genetics, Public Library of Science, vol. 14(9), pages 1-22, September.
    15. Michael DeGiorgio & Zachary A Szpiech, 2022. "A spatially aware likelihood test to detect sweeps from haplotype distributions," PLOS Genetics, Public Library of Science, vol. 18(4), pages 1-37, April.
    16. Clara C Elbers & Carolien G F de Kovel & Yvonne T van der Schouw & Juliaan R Meijboom & Florianne Bauer & Diederick E Grobbee & Gosia Trynka & Jana V van Vliet-Ostaptchouk & Cisca Wijmenga & N Charlot, 2009. "Variants in Neuropeptide Y Receptor 1 and 5 Are Associated with Nutrient-Specific Food Intake and Are Under Recent Selection in Europeans," PLOS ONE, Public Library of Science, vol. 4(9), pages 1-13, September.
    17. Kerdoncuff, Elise & Lambert, Amaury & Achaz, Guillaume, 2020. "Testing for population decline using maximal linkage disequilibrium blocks," Theoretical Population Biology, Elsevier, vol. 134(C), pages 171-181.
    18. Bing Guo & Victor Borda & Roland Laboulaye & Michele D. Spring & Mariusz Wojnarski & Brian A. Vesely & Joana C. Silva & Norman C. Waters & Timothy D. O’Connor & Shannon Takala-Harrison, 2024. "Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    19. Parul Johri & Wolfgang Stephan & Jeffrey D Jensen, 2022. "Soft selective sweeps: Addressing new definitions, evaluating competing models, and interpreting empirical outliers," PLOS Genetics, Public Library of Science, vol. 18(2), pages 1-12, February.
    20. Simone Rubinacci & Olivier Delaneau & Jonathan Marchini, 2020. "Genotype imputation using the Positional Burrows Wheeler Transform," PLOS Genetics, Public Library of Science, vol. 16(11), pages 1-19, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008619. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.