IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000172.html
   My bibliography  Save this article

Probabilistic Phylogenetic Inference with Insertions and Deletions

Author

Listed:
  • Elena Rivas
  • Sean R Eddy

Abstract

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.Author Summary: We describe a computationally efficient method to use insertion and deletion events, in addition to substitutions, in phylogenetic inference. To date, many evolutionary models in probabilistic phylogenetic inference methods have only accounted for substitution events, not for insertions and deletions. As a result, not only do tree inference methods use less sequence information than they could, but also it has remained difficult to integrate phylogenetic modeling into sequence alignment methods (such as profiles and profile-hidden Markov models) that inherently require a model of insertion and deletion events. Therefore an important goal in the field has been to develop tractable evolutionary models of insertion/deletion events over time of sufficient accuracy to increase the resolution of phylogenetic inference methods and to increase the power of profile-based sequence homology searches. Our model offers a partial answer to this problem. We show that our model generally improves inference power in both simulated and real data and that it is easily implemented in the framework of standard inference packages with little effect on computational efficiency (we extended dnaml, in Felsenstein's popular phylip package).

Suggested Citation

  • Elena Rivas & Sean R Eddy, 2008. "Probabilistic Phylogenetic Inference with Insertions and Deletions," PLOS Computational Biology, Public Library of Science, vol. 4(9), pages 1-21, September.
  • Handle: RePEc:plo:pcbi00:1000172
    DOI: 10.1371/journal.pcbi.1000172
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000172
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000172&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000172?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bob Mau & Michael A. Newton & Bret Larget, 1999. "Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods," Biometrics, The International Biometric Society, vol. 55(1), pages 1-12, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anuj Srivastava & Liming Cai & Jan Mrázek & Russell L Malmberg, 2011. "Mutational Patterns in RNA Secondary Structure Evolution Examined in Three RNA Families," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-10, June.
    2. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    3. Andoni, Alexandr & Daskalakis, Constantinos & Hassidim, Avinatan & Roch, Sebastien, 2012. "Global alignment of molecular sequences via ancestral state reconstruction," Stochastic Processes and their Applications, Elsevier, vol. 122(12), pages 3852-3874.
    4. Robert K Bradley & Ian Holmes, 2009. "Evolutionary Triplet Models of Structured RNA," PLOS Computational Biology, Public Library of Science, vol. 5(8), pages 1-20, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Barry G Hall & Stephen J Salipante, 2007. "Measures of Clade Confidence Do Not Correlate with Accuracy of Phylogenetic Trees," PLOS Computational Biology, Public Library of Science, vol. 3(3), pages 1-9, March.
    2. Ian J. Wilson & Michael E. Weale & David J. Balding, 2003. "Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 166(2), pages 155-188, June.
    3. Alexandra Gavryushkina & David Welch & Tanja Stadler & Alexei J Drummond, 2014. "Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration," PLOS Computational Biology, Public Library of Science, vol. 10(12), pages 1-15, December.
    4. Lin, Yu-Min & Fang, Shu-Cherng & Thorne, Jeffrey L., 2007. "A tabu search algorithm for maximum parsimony phylogeny inference," European Journal of Operational Research, Elsevier, vol. 176(3), pages 1908-1917, February.
    5. Rigat, F. & Mira, A., 2012. "Parallel hierarchical sampling: A general-purpose interacting Markov chains Monte Carlo algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1450-1467.
    6. Jordan Douglas & Rong Zhang & Remco Bouckaert, 2021. "Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-30, February.
    7. Spade David A., 2020. "An extended model for phylogenetic maximum likelihood based on discrete morphological characters," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-11, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000172. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.