IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v13y2014i5p21n5.html
   My bibliography  Save this article

Bayesian modelling of compositional heterogeneity in molecular phylogenetics

Author

Listed:
  • Heaps Sarah E.

    (School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK)

  • Nye Tom M.W.

    (School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK)

  • Boys Richard J.

    (School of Mathematics and Statistics, Herschel Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK)

  • Williams Tom A.

    (Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK)

  • Embley T. Martin

    (Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Catherine Cookson Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK)

Abstract

In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

Suggested Citation

  • Heaps Sarah E. & Nye Tom M.W. & Boys Richard J. & Williams Tom A. & Embley T. Martin, 2014. "Bayesian modelling of compositional heterogeneity in molecular phylogenetics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(5), pages 589-609, October.
  • Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:5:p:21:n:5
    DOI: 10.1515/sagmb-2013-0077
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2013-0077
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2013-0077?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Heaps, Sarah E. & Boys, Richard J. & Farrow, Malcolm, 2014. "Computation of marginal likelihoods with data-dependent support for latent variables," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 392-401.
    2. Tom A. Williams & Peter G. Foster & Cymon J. Cox & T. Martin Embley, 2013. "An archaeal origin of eukaryotes supports only two primary domains of life," Nature, Nature, vol. 504(7479), pages 231-236, December.
    3. N. Friel & A. N. Pettitt, 2008. "Marginal likelihood estimation via power posteriors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(3), pages 589-607, July.
    4. T. Martin Embley & William Martin, 2006. "Eukaryotic evolution, changes and challenges," Nature, Nature, vol. 440(7084), pages 623-630, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pavel Dolezal & Michael J Dagley & Maya Kono & Peter Wolynec & Vladimir A Likić & Jung Hock Foo & Miroslava Sedinová & Jan Tachezy & Anna Bachmann & Iris Bruchhaus & Trevor Lithgow, 2010. "The Essentials of Protein Import in the Degenerate Mitochondrion of Entamoeba histolytica," PLOS Pathogens, Public Library of Science, vol. 6(3), pages 1-13, March.
    2. Xing Ju Lee & Christopher C. Drovandi & Anthony N. Pettitt, 2015. "Model choice problems using approximate Bayesian computation with applications to pathogen transmission data sets," Biometrics, The International Biometric Society, vol. 71(1), pages 198-207, March.
    3. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    4. Will Penny & Biswa Sengupta, 2016. "Annealed Importance Sampling for Neural Mass Models," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-25, March.
    5. Spezia, L. & Cooksley, S.L. & Brewer, M.J. & Donnelly, D. & Tree, A., 2014. "Modelling species abundance in a river by Negative Binomial hidden Markov models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 599-614.
    6. Vitoratou, Silia & Ntzoufras, Ioannis & Moustaki, Irini, 2016. "Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions," LSE Research Online Documents on Economics 57685, London School of Economics and Political Science, LSE Library.
    7. Spezia, Luigi, 2020. "Bayesian variable selection in non-homogeneous hidden Markov models through an evolutionary Monte Carlo method," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    8. AWLP Thilan & P Menéndez & JM McGree, 2023. "Assessing the ability of adaptive designs to capture trends in hard coral cover," Environmetrics, John Wiley & Sons, Ltd., vol. 34(6), September.
    9. Elaine A. Ferguson & Jason Matthiopoulos & Robert H. Insall & Dirk Husmeier, 2017. "Statistical inference of the mechanisms driving collective cell movement," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(4), pages 869-890, August.
    10. Joshua C. C. Chan & Liana Jacobi & Dan Zhu, 2022. "An automated prior robustness analysis in Bayesian model comparison," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 583-602, April.
    11. Luigi Spezia & Andy Vinten & Roberta Paroli & Marc Stutter, 2021. "An evolutionary Monte Carlo method for the analysis of turbidity high‐frequency time series through Markov switching autoregressive models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    12. Marco Grzegorczyk & Andrej Aderhold & Dirk Husmeier, 2017. "Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration," Computational Statistics, Springer, vol. 32(2), pages 717-761, June.
    13. Christopher C. Drovandi & Anthony N. Pettitt, 2013. "Bayesian Experimental Design for Models with Intractable Likelihoods," Biometrics, The International Biometric Society, vol. 69(4), pages 937-948, December.
    14. Luca Martino & Fernando Llorente & Ernesto Curbelo & Javier López-Santiago & Joaquín Míguez, 2021. "Automatic Tempered Posterior Distributions for Bayesian Inversion Problems," Mathematics, MDPI, vol. 9(7), pages 1-17, April.
    15. repec:dau:papers:123456789/5724 is not listed on IDEAS
    16. Li, Yong & Wang, Nianling & Yu, Jun, 2023. "Improved marginal likelihood estimation via power posteriors and importance sampling," Journal of Econometrics, Elsevier, vol. 234(1), pages 28-52.
    17. Zhang, Yifan & Fong, Duncan K.H. & DeSarbo, Wayne S., 2021. "A generalized ordinal finite mixture regression model for market segmentation," International Journal of Research in Marketing, Elsevier, vol. 38(4), pages 1055-1072.
    18. Fouskakis, Dimitris & Ntzoufras, Ioannis & Perrakis, Konstantinos, 2020. "Variations of power-expected-posterior priors in normal regression models," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    19. Alzahrani, Naif & Neal, Peter & Spencer, Simon E.F. & McKinley, Trevelyan J. & Touloupou, Panayiota, 2018. "Model selection for time series of count data," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 33-44.
    20. Filippone, Maurizio & Sanguinetti, Guido, 2011. "Approximate inference of the bandwidth in multivariate kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3104-3122, December.
    21. Abhijith Makki & Petr Rada & Vojtěch Žárský & Sami Kereïche & Lubomír Kováčik & Marian Novotný & Tobias Jores & Doron Rapaport & Jan Tachezy, 2019. "Triplet-pore structure of a highly divergent TOM complex of hydrogenosomes in Trichomonas vaginalis," PLOS Biology, Public Library of Science, vol. 17(1), pages 1-32, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:13:y:2014:i:5:p:21:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.