IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006581.html
   My bibliography  Save this article

Efficient pedigree recording for fast population genetics simulation

Author

Listed:
  • Jerome Kelleher
  • Kevin R Thornton
  • Jaime Ashander
  • Peter L Ralph

Abstract

In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly ‘simplify’ a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation’s entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.Author summary: Sexually reproducing organisms are related to the others in their species by the complex web of parent-offspring relationships that constitute the pedigree. In this paper, we describe a way to record all of these relationships, as well as how genetic material is passed down through the pedigree, during a forwards-time population genetic simulation. To make effective use of this information, we describe both efficient storage methods for this embellished pedigree as well as a way to remove all information that is irrelevant to the genetic history of a given set of individuals, which dramatically reduces the required amount of storage space. Storing this information allows us to produce whole-genome sequence from simulations of large populations in which we have not explicitly recorded new genomic mutations; we find that this results in computational run times of up to 50 times faster than simulations forced to explicitly carry along that information.

Suggested Citation

  • Jerome Kelleher & Kevin R Thornton & Jaime Ashander & Peter L Ralph, 2018. "Efficient pedigree recording for fast population genetics simulation," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-21, November.
  • Handle: RePEc:plo:pcbi00:1006581
    DOI: 10.1371/journal.pcbi.1006581
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006581
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006581&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006581?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stepfanie M Aguillon & John W Fitzpatrick & Reed Bowman & Stephan J Schoech & Andrew G Clark & Graham Coop & Nancy Chen, 2017. "Deconstructing isolation-by-distance: The genomic consequences of limited dispersal," PLOS Genetics, Public Library of Science, vol. 13(8), pages 1-27, August.
    2. Jerome Kelleher & Alison M Etheridge & Gilean McVean, 2016. "Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-22, May.
    3. Unknown, 2005. "Forward," 2005 Conference: Slovenia in the EU - Challenges for Agriculture, Food Science and Rural Affairs, November 10-11, 2005, Moravske Toplice, Slovenia 183804, Slovenian Association of Agricultural Economists (DAES).
    4. Kelleher, J. & Etheridge, A.M. & Barton, N.H., 2014. "Coalescent simulation in continuous space: Algorithms for large neighbourhood size," Theoretical Population Biology, Elsevier, vol. 95(C), pages 13-23.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bing Guo & Victor Borda & Roland Laboulaye & Michele D. Spring & Mariusz Wojnarski & Brian A. Vesely & Joana C. Silva & Norman C. Waters & Timothy D. O’Connor & Shannon Takala-Harrison, 2024. "Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Ralph, Peter L., 2019. "An empirical approach to demographic inference with genomic data," Theoretical Population Biology, Elsevier, vol. 127(C), pages 91-101.
    3. Ali Mahmoudi & Jere Koskela & Jerome Kelleher & Yao-ban Chan & David Balding, 2022. "Bayesian inference of ancestral recombination graphs," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-15, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pilar Lopez-Llompart & G. Mathias Kondolf, 2016. "Encroachments in floodways of the Mississippi River and Tributaries Project," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(1), pages 513-542, March.
    2. Michelle Sheran Sylvester, 2007. "The Career and Family Choices of Women: A Dynamic Analysis of Labor Force Participation, Schooling, Marriage and Fertility Decisions," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 10(3), pages 367-399, July.
    3. DAVID M. BLAU & WILBERT van der KLAAUW, 2013. "What Determines Family Structure?," Economic Inquiry, Western Economic Association International, vol. 51(1), pages 579-604, January.
    4. Afanasyev, Dmitriy O. & Fedorova, Elena A. & Popov, Viktor U., 2015. "Fine structure of the price–demand relationship in the electricity market: Multi-scale correlation analysis," Energy Economics, Elsevier, vol. 51(C), pages 215-226.
    5. Peter Viggo Jakobsen, 2009. "Small States, Big Influence: The Overlooked Nordic Influence on the Civilian ESDP," Journal of Common Market Studies, Wiley Blackwell, vol. 47(1), pages 81-102, January.
    6. Billio, Monica & Casarin, Roberto & Osuntuyi, Anthony, 2016. "Efficient Gibbs sampling for Markov switching GARCH models," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 37-57.
    7. Jan Babecký & Fabrizio Coricelli & Roman Horváth, 2009. "Assessing Inflation Persistence: Micro Evidence on an Inflation Targeting Economy," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 59(2), pages 102-127, June.
    8. Lloyd, S. P., 2017. "Unconventional Monetary Policy and the Interest Rate Channel: Signalling and Portfolio Rebalancing," Cambridge Working Papers in Economics 1735, Faculty of Economics, University of Cambridge.
    9. Ichiro Fukunaga, 2007. "Imperfect Common Knowledge, Staggered Price Setting, and the Effects of Monetary Policy," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 39(7), pages 1711-1739, October.
    10. Albertazzi, Ugo & Gambacorta, Leonardo, 2009. "Bank profitability and the business cycle," Journal of Financial Stability, Elsevier, vol. 5(4), pages 393-409, December.
    11. Beck, Thorsten & Demirgüç-Kunt, Asli & Merrouche, Ouarda, 2013. "Islamic vs. conventional banking: Business model, efficiency and stability," Journal of Banking & Finance, Elsevier, vol. 37(2), pages 433-447.
    12. Jinho Bae & Chang-Jin Kim & Dong Kim, 2012. "The evolution of the monetary policy regimes in the U.S," Empirical Economics, Springer, vol. 43(2), pages 617-649, October.
    13. McMahon, Rob, 2020. "Co-developing digital inclusion policy and programming with indigenous partners: Interventions from Canada," Internet Policy Review: Journal on Internet Regulation, Alexander von Humboldt Institute for Internet and Society (HIIG), Berlin, vol. 9(2), pages 1-26.
    14. George W. Evans & Seppo Honkapohja, 2009. "Robust Learning Stability with Operational Monetary Policy Rules," Central Banking, Analysis, and Economic Policies Book Series, in: Klaus Schmidt-Hebbel & Carl E. Walsh & Norman Loayza (Series Editor) & Klaus Schmidt-Hebbel (Series (ed.),Monetary Policy under Uncertainty and Learning, edition 1, volume 13, chapter 5, pages 145-170, Central Bank of Chile.
    15. Lehtonen, Heikki & Kujala, Sanna, 2007. "Climate change impacts on crop risks and agricultural production in Finland," 101st Seminar, July 5-6, 2007, Berlin Germany 9259, European Association of Agricultural Economists.
    16. Michael Pomerleano, 2011. "Developing Regional Financial Markets – the Case of East Asia," Chapters, in: Ulrich Volz (ed.), Regional Integration, Economic Development and Global Governance, chapter 9, Edward Elgar Publishing.
    17. Gary Charness & Francesco Feri & Miguel A. Meléndez-Jiménez & Matthias Sutter, 2023. "An Experimental Study on the Effects of Communication, Credibility, and Clustering in Network Games," The Review of Economics and Statistics, MIT Press, vol. 105(6), pages 1530-1543, November.
    18. Kitsul, Yuriy & Wright, Jonathan H., 2013. "The economics of options-implied inflation probability density functions," Journal of Financial Economics, Elsevier, vol. 110(3), pages 696-711.
    19. Dieter Balkenborg & Rosemarie Nagel, 2016. "An Experiment on Forward vs. Backward Induction: How Fairness and Level k Reasoning Matter," German Economic Review, Verein für Socialpolitik, vol. 17(3), pages 378-408, August.
    20. J. Park & T. P. Seager & P. S. C. Rao & M. Convertino & I. Linkov, 2013. "Integrating Risk and Resilience Approaches to Catastrophe Management in Engineering Systems," Risk Analysis, John Wiley & Sons, vol. 33(3), pages 356-367, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006581. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.