IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v156y2021ics0167947320301468.html
   My bibliography  Save this article

The impact of genetic diversity statistics on model selection between coalescents

Author

Listed:
  • Freund, Fabian
  • Siri-Jégousse, Arno

Abstract

Modeling genetic diversity needs an underlying genealogy model. To choose a fitting model based on genetic data, one can perform model selection between classes of genealogical trees, e.g. Kingman’s coalescent with exponential growth or multiple merger coalescents. Such selection can be based on many different statistics measuring genetic diversity. A random forest based Approximate Bayesian Computation is used to disentangle the effects of different statistics on distinguishing between various classes of genealogy models. For the specific question of inferring whether genealogies feature multiple mergers, a new statistic, the minimal observable clade size, is introduced. When combined with classical site frequency based statistics, it reduces classification errors considerably.

Suggested Citation

  • Freund, Fabian & Siri-Jégousse, Arno, 2021. "The impact of genetic diversity statistics on model selection between coalescents," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
  • Handle: RePEc:eee:csdana:v:156:y:2021:i:c:s0167947320301468
    DOI: 10.1016/j.csda.2020.107055
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320301468
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.107055?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. Hobolth, Asger & Siri-Jégousse, Arno & Bladt, Mogens, 2019. "Phase-type distributions in population genetics," Theoretical Population Biology, Elsevier, vol. 127(C), pages 16-32.
    3. Blath, Jochen & Cronjäger, Mathias Christensen & Eldon, Bjarki & Hammer, Matthias, 2016. "The site-frequency spectrum associated with Ξ-coalescents," Theoretical Population Biology, Elsevier, vol. 110(C), pages 36-50.
    4. Durrett, Rick & Schweinsberg, Jason, 2005. "A coalescent model for the effect of advantageous mutations on the genealogy of a population," Stochastic Processes and their Applications, Elsevier, vol. 115(10), pages 1628-1657, October.
    5. Schweinsberg, Jason, 2003. "Coalescent processes obtained from supercritical Galton-Watson processes," Stochastic Processes and their Applications, Elsevier, vol. 106(1), pages 107-139, July.
    6. Steinrücken, Matthias & Birkner, Matthias & Blath, Jochen, 2013. "Analysis of DNA sequence variation within marine species using Beta-coalescents," Theoretical Population Biology, Elsevier, vol. 87(C), pages 15-24.
    7. Koskela Jere, 2018. "Multi-locus data distinguishes between population growth and multiple merger coalescents," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(3), pages 1-21, June.
    8. Koskela Jere, 2018. "Multi-locus data distinguishes between population growth and multiple merger coalescents," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(3), pages 1-21, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hobolth, Asger & Rivas-González, Iker & Bladt, Mogens & Futschik, Andreas, 2024. "Phase-type distributions in mathematical population genetics: An emerging framework," Theoretical Population Biology, Elsevier, vol. 157(C), pages 14-32.
    2. Miró Pina, Verónica & Joly, Émilien & Siri-Jégousse, Arno, 2023. "Estimating the Lambda measure in multiple-merger coalescents," Theoretical Population Biology, Elsevier, vol. 154(C), pages 94-101.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hobolth, Asger & Rivas-González, Iker & Bladt, Mogens & Futschik, Andreas, 2024. "Phase-type distributions in mathematical population genetics: An emerging framework," Theoretical Population Biology, Elsevier, vol. 157(C), pages 14-32.
    2. Blath, Jochen & Cronjäger, Mathias Christensen & Eldon, Bjarki & Hammer, Matthias, 2016. "The site-frequency spectrum associated with Ξ-coalescents," Theoretical Population Biology, Elsevier, vol. 110(C), pages 36-50.
    3. Miró Pina, Verónica & Joly, Émilien & Siri-Jégousse, Arno, 2023. "Estimating the Lambda measure in multiple-merger coalescents," Theoretical Population Biology, Elsevier, vol. 154(C), pages 94-101.
    4. Eldon, Bjarki & Stephan, Wolfgang, 2018. "Evolution of highly fecund haploid populations," Theoretical Population Biology, Elsevier, vol. 119(C), pages 48-56.
    5. Eldon, Bjarki, 2011. "Estimation of parameters in large offspring number models and ratios of coalescence times," Theoretical Population Biology, Elsevier, vol. 80(1), pages 16-28.
    6. Etheridge, Alison M. & Griffiths, Robert C. & Taylor, Jesse E., 2010. "A coalescent dual process in a Moran model with genic selection, and the lambda coalescent limit," Theoretical Population Biology, Elsevier, vol. 78(2), pages 77-92.
    7. Steinrücken, Matthias & Birkner, Matthias & Blath, Jochen, 2013. "Analysis of DNA sequence variation within marine species using Beta-coalescents," Theoretical Population Biology, Elsevier, vol. 87(C), pages 15-24.
    8. Blath, Jochen & Buzzoni, Eugenio & Koskela, Jere & Wilke Berenguer, Maite, 2020. "Statistical tools for seed bank detection," Theoretical Population Biology, Elsevier, vol. 132(C), pages 1-15.
    9. Bjarki Eldon, 2023. "Viability Selection at Linked Sites," Mathematics, MDPI, vol. 11(3), pages 1-23, January.
    10. Backer, David & Billing, Trey, 2024. "Forecasting the prevalence of child acute malnutrition using environmental and conflict conditions as leading indicators," World Development, Elsevier, vol. 176(C).
    11. David Mouillot & Laure Velez & Camille Albouy & Nicolas Casajus & Joachim Claudet & Vincent Delbar & Rodolphe Devillers & Tom B. Letessier & Nicolas Loiseau & Stéphanie Manel & Laura Mannocci & Jessic, 2024. "The socioeconomic and environmental niche of protected areas reveals global conservation gaps and opportunities," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    12. Mariana Oliveira & Luís Torgo & Vítor Santos Costa, 2021. "Evaluation Procedures for Forecasting with Spatiotemporal Data," Mathematics, MDPI, vol. 9(6), pages 1-27, March.
    13. Dhersin, Jean-Stéphane & Freund, Fabian & Siri-Jégousse, Arno & Yuan, Linglong, 2013. "On the length of an external branch in the Beta-coalescent," Stochastic Processes and their Applications, Elsevier, vol. 123(5), pages 1691-1715.
    14. Yuanyuan Shi & Junyu Zhao & Xianchong Song & Zuoyu Qin & Lichao Wu & Huili Wang & Jian Tang, 2021. "Hyperspectral band selection and modeling of soil organic matter content in a forest using the Ranger algorithm," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-15, June.
    15. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    16. Marcela Mendoza-Suárez & Turgut Yigit Akyol & Marcin Nadzieja & Stig U. Andersen, 2024. "Increased diversity of beneficial rhizobia enhances faba bean growth," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    17. Andreas D. Meid & Lucas Wirbka, 2022. "Can Machine Learning from Real-World Data Support Drug Treatment Decisions? A Prediction Modeling Case for Direct Oral Anticoagulants," Medical Decision Making, , vol. 42(5), pages 587-598, July.
    18. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    19. Joel Podgorski & Oliver Kracht & Luis Araguas-Araguas & Stefan Terzer-Wassmuth & Jodie Miller & Ralf Straub & Rolf Kipfer & Michael Berg, 2024. "Groundwater vulnerability to pollution in Africa’s Sahel region," Nature Sustainability, Nature, vol. 7(5), pages 558-567, May.
    20. Mikula, Lynette Caitlin & Vogl, Claus, 2024. "The expected sample allele frequencies from populations of changing size via orthogonal polynomials," Theoretical Population Biology, Elsevier, vol. 157(C), pages 55-85.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:156:y:2021:i:c:s0167947320301468. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.