IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v75y2014icp81-95.html
   My bibliography  Save this article

Classification of molecular sequence data using Bayesian phylogenetic mixture models

Author

Listed:
  • Loza-Reyes, E.
  • Hurn, M.A.
  • Robinson, A.

Abstract

Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and site classification; the steppingstone sampler (SS) is used to select amongst competing mixture models. Example applications of simulated data and mitochondrial DNA of primates illustrate site classification via ‘augmented’ Bayesian phylogenetic mixtures. In both examples, all mixtures outperform commonly-used models of among-site rate variation and models that do not account for rate heterogeneity. The examples further demonstrate how site classification is readily available from the analysis output. The method is directly relevant to the choice of partitions in Bayesian phylogenetics, and its application may lead to the discovery of structure not otherwise recognised in a molecular sequence alignment. Computational aspects of Bayesian phylogenetic model estimation are discussed, including the use of simple Markov chain Monte Carlo (MCMC) moves that mix efficiently without tempering the chains. The contribution to the field of Bayesian phylogenetics is in (1) the use of mixture models augmented with allocation variables as tools for site classification and quantification of classification uncertainty, (2) the successful application of SS for selection of phylogenetic mixtures, and (3) the development of novel MCMC aspects of relevance to Bayesian phylogenetic models—whether mixtures or not.11The MCMC methods discussed in this paper have been coded in a C program; source files are available upon request. Supplementary material is available online (see Appendix A).

Suggested Citation

  • Loza-Reyes, E. & Hurn, M.A. & Robinson, A., 2014. "Classification of molecular sequence data using Bayesian phylogenetic mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 81-95.
  • Handle: RePEc:eee:csdana:v:75:y:2014:i:c:p:81-95
    DOI: 10.1016/j.csda.2014.01.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731400019X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.01.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ardia, David & Baştürk, Nalan & Hoogerheide, Lennart & van Dijk, Herman K., 2012. "A comparative study of Monte Carlo methods for efficient evaluation of marginal likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3398-3414.
    2. N. Friel & A. N. Pettitt, 2008. "Marginal likelihood estimation via power posteriors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(3), pages 589-607, July.
    3. Merrilee Hurn & Peter J. Green & Fahimah Al‐Awadhi, 2008. "A Bayesian hierarchical model for photometric red shifts," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 57(4), pages 487-504, September.
    4. Kitchen, Christina M.R. & Kroll, Jing & Kuritzkes, Daniel R. & Bloomquist, Erik & Deeks, Steven G. & Suchard, Marc A., 2009. "Two-way Bayesian hierarchical phylogenetic models: An application to the co-evolution of gp120 and gp41 during and after enfuvirtide treatment," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 766-775, January.
    5. Calderhead, Ben & Girolami, Mark, 2009. "Estimating Bayes factors via thermodynamic integration and population MCMC," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4028-4045, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Will Penny & Biswa Sengupta, 2016. "Annealed Importance Sampling for Neural Mass Models," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-25, March.
    2. Spezia, Luigi, 2020. "Bayesian variable selection in non-homogeneous hidden Markov models through an evolutionary Monte Carlo method," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    3. Joshua C. C. Chan & Liana Jacobi & Dan Zhu, 2022. "An automated prior robustness analysis in Bayesian model comparison," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 583-602, April.
    4. Luigi Spezia & Andy Vinten & Roberta Paroli & Marc Stutter, 2021. "An evolutionary Monte Carlo method for the analysis of turbidity high‐frequency time series through Markov switching autoregressive models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    5. Marco Grzegorczyk & Andrej Aderhold & Dirk Husmeier, 2017. "Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration," Computational Statistics, Springer, vol. 32(2), pages 717-761, June.
    6. Filippone, Maurizio & Sanguinetti, Guido, 2011. "Approximate inference of the bandwidth in multivariate kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3104-3122, December.
    7. Joshua C. C. Chan & Eric Eisenstat, 2015. "Marginal Likelihood Estimation with the Cross-Entropy Method," Econometric Reviews, Taylor & Francis Journals, vol. 34(3), pages 256-285, March.
    8. Chris J. Oates & Mark Girolami & Nicolas Chopin, 2017. "Control functionals for Monte Carlo integration," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 695-718, June.
    9. Perrakis, Konstantinos & Ntzoufras, Ioannis & Tsionas, Efthymios G., 2014. "On the use of marginal posteriors in marginal likelihood estimation via importance sampling," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 54-69.
    10. Hajargasht, Gholamreza & Rao, D.S. Prasada, 2019. "Multilateral index number systems for international price comparisons: Properties, existence and uniqueness," Journal of Mathematical Economics, Elsevier, vol. 83(C), pages 36-47.
    11. Ardia, David & Hoogerheide, Lennart F., 2010. "Efficient Bayesian estimation and combination of GARCH-type models," MPRA Paper 22919, University Library of Munich, Germany.
    12. Xing Ju Lee & Christopher C. Drovandi & Anthony N. Pettitt, 2015. "Model choice problems using approximate Bayesian computation with applications to pathogen transmission data sets," Biometrics, The International Biometric Society, vol. 71(1), pages 198-207, March.
    13. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    14. Spezia, L. & Cooksley, S.L. & Brewer, M.J. & Donnelly, D. & Tree, A., 2014. "Modelling species abundance in a river by Negative Binomial hidden Markov models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 599-614.
    15. Joshua C. C. Chan, 2018. "Specification tests for time-varying parameter models with stochastic volatility," Econometric Reviews, Taylor & Francis Journals, vol. 37(8), pages 807-823, September.
    16. Vitoratou, Silia & Ntzoufras, Ioannis & Moustaki, Irini, 2016. "Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions," LSE Research Online Documents on Economics 57685, London School of Economics and Political Science, LSE Library.
    17. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    18. Luc Bauwens & Jean-François Carpantier & Arnaud Dufays, 2017. "Autoregressive Moving Average Infinite Hidden Markov-Switching Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(2), pages 162-182, April.
    19. AWLP Thilan & P Menéndez & JM McGree, 2023. "Assessing the ability of adaptive designs to capture trends in hard coral cover," Environmetrics, John Wiley & Sons, Ltd., vol. 34(6), September.
    20. Elaine A. Ferguson & Jason Matthiopoulos & Robert H. Insall & Dirk Husmeier, 2017. "Statistical inference of the mechanisms driving collective cell movement," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(4), pages 869-890, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:75:y:2014:i:c:p:81-95. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.