IDEAS home Printed from https://ideas.repec.org/a/eee/thpobi/v157y2024icp14-32.html
   My bibliography  Save this article

Phase-type distributions in mathematical population genetics: An emerging framework

Author

Listed:
  • Hobolth, Asger
  • Rivas-González, Iker
  • Bladt, Mogens
  • Futschik, Andreas

Abstract

A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the ‘phases’ in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.

Suggested Citation

  • Hobolth, Asger & Rivas-González, Iker & Bladt, Mogens & Futschik, Andreas, 2024. "Phase-type distributions in mathematical population genetics: An emerging framework," Theoretical Population Biology, Elsevier, vol. 157(C), pages 14-32.
  • Handle: RePEc:eee:thpobi:v:157:y:2024:i:c:p:14-32
    DOI: 10.1016/j.tpb.2024.03.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040580924000212
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tpb.2024.03.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Freund, Fabian & Siri-Jégousse, Arno, 2021. "The impact of genetic diversity statistics on model selection between coalescents," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    2. Severson, Alissa L. & Carmi, Shai & Rosenberg, Noah A., 2021. "Variance and limiting distribution of coalescence times in a diploid model of a consanguineous population," Theoretical Population Biology, Elsevier, vol. 139(C), pages 50-65.
    3. Hobolth, Asger & Siri-Jégousse, Arno & Bladt, Mogens, 2019. "Phase-type distributions in population genetics," Theoretical Population Biology, Elsevier, vol. 127(C), pages 16-32.
    4. Schweinsberg, Jason, 2003. "Coalescent processes obtained from supercritical Galton-Watson processes," Stochastic Processes and their Applications, Elsevier, vol. 106(1), pages 107-139, July.
    5. Mogens Bladt & Antonio Gonzalez & Steffen L. Lauritzen, 2003. "The estimation of phase-type related functionals using Markov chain Monte Carlo methods," Scandinavian Actuarial Journal, Taylor & Francis Journals, vol. 2003(4), pages 280-300.
    6. Gardner, Clara Brimnes & Nielsen, Sara Dorthea & Eltved, Morten & Rasmussen, Thomas Kjær & Nielsen, Otto Anker & Nielsen, Bo Friis, 2021. "Calculating conditional passenger travel time distributions in mixed schedule- and frequency-based public transport networks using Markov chains," Transportation Research Part B: Methodological, Elsevier, vol. 152(C), pages 1-17.
    7. Uyenoyama, Marcy K. & Takebayashi, Naoki & Kumagai, Seiji, 2019. "Inductive determination of allele frequency spectrum probabilities in structured populations," Theoretical Population Biology, Elsevier, vol. 129(C), pages 148-159.
    8. Costa, Rui J. & Wilkinson-Herbots, Hilde M., 2021. "Inference of gene flow in the process of speciation: Efficient maximum-likelihood implementation of a generalised isolation-with-migration model," Theoretical Population Biology, Elsevier, vol. 140(C), pages 1-15.
    9. Griffiths, Robert C. & Tavaré, Simon, 2018. "Ancestral inference from haplotypes and mutations," Theoretical Population Biology, Elsevier, vol. 122(C), pages 12-21.
    10. Uyenoyama, Marcy K. & Takebayashi, Naoki & Kumagai, Seiji, 2020. "Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent," Theoretical Population Biology, Elsevier, vol. 133(C), pages 130-140.
    11. V. G. Kulkarni, 1989. "A New Class of Multivariate Phase Type Distributions," Operations Research, INFORMS, vol. 37(1), pages 151-158, February.
    12. Alimpiev, Egor & Rosenberg, Noah A., 2022. "A compendium of covariances and correlation coefficients of coalescent tree properties," Theoretical Population Biology, Elsevier, vol. 143(C), pages 1-13.
    13. Blath, Jochen & Buzzoni, Eugenio & Koskela, Jere & Wilke Berenguer, Maite, 2020. "Statistical tools for seed bank detection," Theoretical Population Biology, Elsevier, vol. 132(C), pages 1-15.
    14. Koskela Jere, 2018. "Multi-locus data distinguishes between population growth and multiple merger coalescents," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(3), pages 1-21, June.
    15. Koskela Jere, 2018. "Multi-locus data distinguishes between population growth and multiple merger coalescents," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(3), pages 1-21, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Freund, Fabian & Siri-Jégousse, Arno, 2021. "The impact of genetic diversity statistics on model selection between coalescents," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    2. Miró Pina, Verónica & Joly, Émilien & Siri-Jégousse, Arno, 2023. "Estimating the Lambda measure in multiple-merger coalescents," Theoretical Population Biology, Elsevier, vol. 154(C), pages 94-101.
    3. Bo Friis Nielsen, 2022. "Characterisation of multivariate phase type distributions," Queueing Systems: Theory and Applications, Springer, vol. 100(3), pages 229-231, April.
    4. Li, Haijun, 2003. "Association of multivariate phase-type distributions, with applications to shock models," Statistics & Probability Letters, Elsevier, vol. 64(4), pages 381-392, October.
    5. Dhersin, Jean-Stéphane & Freund, Fabian & Siri-Jégousse, Arno & Yuan, Linglong, 2013. "On the length of an external branch in the Beta-coalescent," Stochastic Processes and their Applications, Elsevier, vol. 123(5), pages 1691-1715.
    6. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    7. Mikula, Lynette Caitlin & Vogl, Claus, 2024. "The expected sample allele frequencies from populations of changing size via orthogonal polynomials," Theoretical Population Biology, Elsevier, vol. 157(C), pages 55-85.
    8. Eldon, Bjarki, 2011. "Estimation of parameters in large offspring number models and ratios of coalescence times," Theoretical Population Biology, Elsevier, vol. 80(1), pages 16-28.
    9. Cotter, Daniel J. & Severson, Alissa L. & Rosenberg, Noah A., 2021. "The effect of consanguinity on coalescence times on the X chromosome," Theoretical Population Biology, Elsevier, vol. 140(C), pages 32-43.
    10. Qi-Ming He & Jiandong Ren, 2016. "Analysis of a Multivariate Claim Process," Methodology and Computing in Applied Probability, Springer, vol. 18(1), pages 257-273, March.
    11. Hansjörg Albrecher & Martin Bladt & Mogens Bladt, 2021. "Multivariate matrix Mittag–Leffler distributions," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(2), pages 369-394, April.
    12. Huillet, Thierry & Möhle, Martin, 2013. "On the extended Moran model and its relation to coalescents with multiple collisions," Theoretical Population Biology, Elsevier, vol. 87(C), pages 5-14.
    13. Surya, Budhi Arta, 2022. "Conditional multivariate distributions of phase-type for a finite mixture of Markov jump processes given observations of sample path," Journal of Multivariate Analysis, Elsevier, vol. 191(C).
    14. Birkner, Matthias & Blath, Jochen & Steinrücken, Matthias, 2011. "Importance sampling for Lambda-coalescents in the infinitely many sites model," Theoretical Population Biology, Elsevier, vol. 79(4), pages 155-173.
    15. Legried, Brandon & Terhorst, Jonathan, 2022. "Rates of convergence in the two-island and isolation-with-migration models," Theoretical Population Biology, Elsevier, vol. 147(C), pages 16-27.
    16. Hobolth, Asger & Siri-Jégousse, Arno & Bladt, Mogens, 2019. "Phase-type distributions in population genetics," Theoretical Population Biology, Elsevier, vol. 127(C), pages 16-32.
    17. Cheung, Eric C.K. & Peralta, Oscar & Woo, Jae-Kyung, 2022. "Multivariate matrix-exponential affine mixtures and their applications in risk theory," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 364-389.
    18. Eldon, Bjarki & Stephan, Wolfgang, 2018. "Evolution of highly fecund haploid populations," Theoretical Population Biology, Elsevier, vol. 119(C), pages 48-56.
    19. Badila, E.S. & Boxma, O.J. & Resing, J.A.C., 2015. "Two parallel insurance lines with simultaneous arrivals and risks correlated with inter-arrival times," Insurance: Mathematics and Economics, Elsevier, vol. 61(C), pages 48-61.
    20. Steinrücken, Matthias & Birkner, Matthias & Blath, Jochen, 2013. "Analysis of DNA sequence variation within marine species using Beta-coalescents," Theoretical Population Biology, Elsevier, vol. 87(C), pages 15-24.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:thpobi:v:157:y:2024:i:c:p:14-32. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/intelligence .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.