IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v75y2014icp81-95.html
   My bibliography  Save this article

Classification of molecular sequence data using Bayesian phylogenetic mixture models

Author

Listed:
  • Loza-Reyes, E.
  • Hurn, M.A.
  • Robinson, A.

Abstract

Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and site classification; the steppingstone sampler (SS) is used to select amongst competing mixture models. Example applications of simulated data and mitochondrial DNA of primates illustrate site classification via ‘augmented’ Bayesian phylogenetic mixtures. In both examples, all mixtures outperform commonly-used models of among-site rate variation and models that do not account for rate heterogeneity. The examples further demonstrate how site classification is readily available from the analysis output. The method is directly relevant to the choice of partitions in Bayesian phylogenetics, and its application may lead to the discovery of structure not otherwise recognised in a molecular sequence alignment. Computational aspects of Bayesian phylogenetic model estimation are discussed, including the use of simple Markov chain Monte Carlo (MCMC) moves that mix efficiently without tempering the chains. The contribution to the field of Bayesian phylogenetics is in (1) the use of mixture models augmented with allocation variables as tools for site classification and quantification of classification uncertainty, (2) the successful application of SS for selection of phylogenetic mixtures, and (3) the development of novel MCMC aspects of relevance to Bayesian phylogenetic models—whether mixtures or not.11The MCMC methods discussed in this paper have been coded in a C program; source files are available upon request. Supplementary material is available online (see Appendix A).

Suggested Citation

  • Loza-Reyes, E. & Hurn, M.A. & Robinson, A., 2014. "Classification of molecular sequence data using Bayesian phylogenetic mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 81-95.
  • Handle: RePEc:eee:csdana:v:75:y:2014:i:c:p:81-95
    DOI: 10.1016/j.csda.2014.01.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731400019X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.01.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ardia, David & Baştürk, Nalan & Hoogerheide, Lennart & van Dijk, Herman K., 2012. "A comparative study of Monte Carlo methods for efficient evaluation of marginal likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3398-3414.
    2. Kitchen, Christina M.R. & Kroll, Jing & Kuritzkes, Daniel R. & Bloomquist, Erik & Deeks, Steven G. & Suchard, Marc A., 2009. "Two-way Bayesian hierarchical phylogenetic models: An application to the co-evolution of gp120 and gp41 during and after enfuvirtide treatment," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 766-775, January.
    3. N. Friel & A. N. Pettitt, 2008. "Marginal likelihood estimation via power posteriors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(3), pages 589-607, July.
    4. Merrilee Hurn & Peter J. Green & Fahimah Al‐Awadhi, 2008. "A Bayesian hierarchical model for photometric red shifts," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 57(4), pages 487-504, September.
    5. Calderhead, Ben & Girolami, Mark, 2009. "Estimating Bayes factors via thermodynamic integration and population MCMC," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4028-4045, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Will Penny & Biswa Sengupta, 2016. "Annealed Importance Sampling for Neural Mass Models," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-25, March.
    2. Joshua C. C. Chan & Liana Jacobi & Dan Zhu, 2022. "An automated prior robustness analysis in Bayesian model comparison," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 583-602, April.
    3. Filippone, Maurizio & Sanguinetti, Guido, 2011. "Approximate inference of the bandwidth in multivariate kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3104-3122, December.
    4. Joshua C. C. Chan & Eric Eisenstat, 2015. "Marginal Likelihood Estimation with the Cross-Entropy Method," Econometric Reviews, Taylor & Francis Journals, vol. 34(3), pages 256-285, March.
    5. Spezia, Luigi, 2020. "Bayesian variable selection in non-homogeneous hidden Markov models through an evolutionary Monte Carlo method," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    6. Luigi Spezia & Andy Vinten & Roberta Paroli & Marc Stutter, 2021. "An evolutionary Monte Carlo method for the analysis of turbidity high‐frequency time series through Markov switching autoregressive models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    7. Marco Grzegorczyk & Andrej Aderhold & Dirk Husmeier, 2017. "Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration," Computational Statistics, Springer, vol. 32(2), pages 717-761, June.
    8. Chris J. Oates & Mark Girolami & Nicolas Chopin, 2017. "Control functionals for Monte Carlo integration," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 695-718, June.
    9. Perrakis, Konstantinos & Ntzoufras, Ioannis & Tsionas, Efthymios G., 2014. "On the use of marginal posteriors in marginal likelihood estimation via importance sampling," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 54-69.
    10. Xing Ju Lee & Christopher C. Drovandi & Anthony N. Pettitt, 2015. "Model choice problems using approximate Bayesian computation with applications to pathogen transmission data sets," Biometrics, The International Biometric Society, vol. 71(1), pages 198-207, March.
    11. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    12. Spezia, L. & Cooksley, S.L. & Brewer, M.J. & Donnelly, D. & Tree, A., 2014. "Modelling species abundance in a river by Negative Binomial hidden Markov models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 599-614.
    13. Joshua C. C. Chan, 2018. "Specification tests for time-varying parameter models with stochastic volatility," Econometric Reviews, Taylor & Francis Journals, vol. 37(8), pages 807-823, September.
    14. Vitoratou, Silia & Ntzoufras, Ioannis & Moustaki, Irini, 2016. "Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions," LSE Research Online Documents on Economics 57685, London School of Economics and Political Science, LSE Library.
    15. AWLP Thilan & P Menéndez & JM McGree, 2023. "Assessing the ability of adaptive designs to capture trends in hard coral cover," Environmetrics, John Wiley & Sons, Ltd., vol. 34(6), September.
    16. Bauwens, Luc & Dufays, Arnaud & Rombouts, Jeroen V.K., 2014. "Marginal likelihood for Markov-switching and change-point GARCH models," Journal of Econometrics, Elsevier, vol. 178(P3), pages 508-522.
    17. Guidolin, Massimo & Ravazzolo, Francesco & Tortora, Andrea Donato, 2013. "Alternative econometric implementations of multi-factor models of the U.S. financial markets," The Quarterly Review of Economics and Finance, Elsevier, vol. 53(2), pages 87-111.
    18. Lukasz Gatarek & Lennart Hoogerheide & Koen Hooning & Herman K. van Dijk, 2013. "Censored Posterior and Predictive Likelihood in Left-Tail Prediction for Accurate Value at Risk Estimation," Tinbergen Institute Discussion Papers 13-060/III, Tinbergen Institute, revised 06 Mar 2014.
    19. Geweke, John & Durham, Garland, 2019. "Sequentially adaptive Bayesian learning algorithms for inference and optimization," Journal of Econometrics, Elsevier, vol. 210(1), pages 4-25.
    20. repec:dau:papers:123456789/5724 is not listed on IDEAS
    21. Nalan Baştürk & Stefano Grassi & Lennart Hoogerheide & Herman K. Van Dijk, 2016. "Parallelization Experience with Four Canonical Econometric Models Using ParMitISEM," Econometrics, MDPI, vol. 4(1), pages 1-20, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:75:y:2014:i:c:p:81-95. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.