IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006581.html
   My bibliography  Save this article

Efficient pedigree recording for fast population genetics simulation

Author

Listed:
  • Jerome Kelleher
  • Kevin R Thornton
  • Jaime Ashander
  • Peter L Ralph

Abstract

In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly ‘simplify’ a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation’s entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.Author summary: Sexually reproducing organisms are related to the others in their species by the complex web of parent-offspring relationships that constitute the pedigree. In this paper, we describe a way to record all of these relationships, as well as how genetic material is passed down through the pedigree, during a forwards-time population genetic simulation. To make effective use of this information, we describe both efficient storage methods for this embellished pedigree as well as a way to remove all information that is irrelevant to the genetic history of a given set of individuals, which dramatically reduces the required amount of storage space. Storing this information allows us to produce whole-genome sequence from simulations of large populations in which we have not explicitly recorded new genomic mutations; we find that this results in computational run times of up to 50 times faster than simulations forced to explicitly carry along that information.

Suggested Citation

  • Jerome Kelleher & Kevin R Thornton & Jaime Ashander & Peter L Ralph, 2018. "Efficient pedigree recording for fast population genetics simulation," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-21, November.
  • Handle: RePEc:plo:pcbi00:1006581
    DOI: 10.1371/journal.pcbi.1006581
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006581
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006581&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006581?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stepfanie M Aguillon & John W Fitzpatrick & Reed Bowman & Stephan J Schoech & Andrew G Clark & Graham Coop & Nancy Chen, 2017. "Deconstructing isolation-by-distance: The genomic consequences of limited dispersal," PLOS Genetics, Public Library of Science, vol. 13(8), pages 1-27, August.
    2. Unknown, 2005. "Forward," 2005 Conference: Slovenia in the EU - Challenges for Agriculture, Food Science and Rural Affairs, November 10-11, 2005, Moravske Toplice, Slovenia 183804, Slovenian Association of Agricultural Economists (DAES).
    3. Kelleher, J. & Etheridge, A.M. & Barton, N.H., 2014. "Coalescent simulation in continuous space: Algorithms for large neighbourhood size," Theoretical Population Biology, Elsevier, vol. 95(C), pages 13-23.
    4. Jerome Kelleher & Alison M Etheridge & Gilean McVean, 2016. "Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-22, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bing Guo & Victor Borda & Roland Laboulaye & Michele D. Spring & Mariusz Wojnarski & Brian A. Vesely & Joana C. Silva & Norman C. Waters & Timothy D. O’Connor & Shannon Takala-Harrison, 2024. "Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Ralph, Peter L., 2019. "An empirical approach to demographic inference with genomic data," Theoretical Population Biology, Elsevier, vol. 127(C), pages 91-101.
    3. Ali Mahmoudi & Jere Koskela & Jerome Kelleher & Yao-ban Chan & David Balding, 2022. "Bayesian inference of ancestral recombination graphs," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-15, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pilar Lopez-Llompart & G. Mathias Kondolf, 2016. "Encroachments in floodways of the Mississippi River and Tributaries Project," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(1), pages 513-542, March.
    2. Cheng, Jianquan & Bertolini, Luca, 2013. "Measuring urban job accessibility with distance decay, competition and diversity," Journal of Transport Geography, Elsevier, vol. 30(C), pages 100-109.
    3. M. De Donno & M. Pratelli, 2006. "A theory of stochastic integration for bond markets," Papers math/0602532, arXiv.org.
    4. Prilly Oktoviany & Robert Knobloch & Ralf Korn, 2021. "A machine learning-based price state prediction model for agricultural commodities using external factors," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 44(2), pages 1063-1085, December.
    5. Michelle Sheran Sylvester, 2007. "The Career and Family Choices of Women: A Dynamic Analysis of Labor Force Participation, Schooling, Marriage and Fertility Decisions," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 10(3), pages 367-399, July.
    6. Henrekson, Magnus & Johansson, Dan, 2010. "Firm Growth, Institutions and Structural Transformation," Ratio Working Papers 150, The Ratio Institute.
    7. Karen K. Lewis, 2011. "Global Asset Pricing," Annual Review of Financial Economics, Annual Reviews, vol. 3(1), pages 435-466, December.
    8. DAVID M. BLAU & WILBERT van der KLAAUW, 2013. "What Determines Family Structure?," Economic Inquiry, Western Economic Association International, vol. 51(1), pages 579-604, January.
    9. Panagiota DIONYSOPOULOU & Georgios SVARNIAS & Theodore PAPAILIAS, 2021. "Total Quality Management In Public Sector, Case Study: Customs Service," Regional Science Inquiry, Hellenic Association of Regional Scientists, vol. 0(1), pages 153-168, June.
    10. Afanasyev, Dmitriy O. & Fedorova, Elena A. & Popov, Viktor U., 2015. "Fine structure of the price–demand relationship in the electricity market: Multi-scale correlation analysis," Energy Economics, Elsevier, vol. 51(C), pages 215-226.
    11. Peter Viggo Jakobsen, 2009. "Small States, Big Influence: The Overlooked Nordic Influence on the Civilian ESDP," Journal of Common Market Studies, Wiley Blackwell, vol. 47(1), pages 81-102, January.
    12. Julie Holland Mortimer, 2007. "Price Discrimination, Copyright Law, and Technological Innovation: Evidence from the Introduction of DVDs," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 122(3), pages 1307-1350.
    13. Suwan Shen & Xi Feng & Zhong Ren Peng, 2016. "A framework to analyze vulnerability of critical infrastructure to climate change: the case of a coastal community in Florida," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 84(1), pages 589-609, October.
    14. Jean-Bernard Chatelain & Kirsten Ralf, 2017. "Can We Identify the Fed's Preferences?," Working Papers halshs-01549908, HAL.
    15. Billio, Monica & Casarin, Roberto & Osuntuyi, Anthony, 2016. "Efficient Gibbs sampling for Markov switching GARCH models," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 37-57.
    16. Jan Babecký & Fabrizio Coricelli & Roman Horváth, 2009. "Assessing Inflation Persistence: Micro Evidence on an Inflation Targeting Economy," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 59(2), pages 102-127, June.
    17. Lloyd, S. P., 2017. "Unconventional Monetary Policy and the Interest Rate Channel: Signalling and Portfolio Rebalancing," Cambridge Working Papers in Economics 1735, Faculty of Economics, University of Cambridge.
    18. Fischer, Andreas M. & Ranaldo, Angelo, 2011. "Does FOMC news increase global FX trading?," Journal of Banking & Finance, Elsevier, vol. 35(11), pages 2965-2973, November.
    19. Mazzlida Mat Deli & Ruhizan Mohamad Yasin, 2016. "Quality Education of Orang Asli in Malaysia," International Journal of Academic Research in Business and Social Sciences, Human Resource Management Academic Research Society, International Journal of Academic Research in Business and Social Sciences, vol. 6(11), pages 233-240, November.
    20. Ichiro Fukunaga, 2007. "Imperfect Common Knowledge, Staggered Price Setting, and the Effects of Monetary Policy," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 39(7), pages 1711-1739, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006581. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.