IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000483.html
   My bibliography  Save this article

Evolutionary Triplet Models of Structured RNA

Author

Listed:
  • Robert K Bradley
  • Ian Holmes

Abstract

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.Author Summary: A number of leading methods for bioinformatics analysis of structural RNAs use probabilistic grammars as models for pairs of homologous RNAs. We show that any such pairwise grammar can be extended to an entire phylogeny by treating the pairwise grammar as a machine (a “transducer”) that models a single ancestor-descendant relationship in the tree, transforming one RNA structure into another. In addition to phylogenetic enhancement of current applications, such as RNA genefinding, homology detection, alignment and secondary structure prediction, this should enable probabilistic phylogenetic reconstruction of RNA sequences that are ancestral to present-day genes. We describe statistical inference algorithms, software implementations, and a simulation-based comparison of three-taxon maximum likelihood alignment to several other methods for aligning three sibling RNAs. In the Discussion we consider how the three-taxon RNA alignment-reconstruction-folding algorithm, which is currently very computationally-expensive, might be made more efficient so that larger phylogenies could be considered.

Suggested Citation

  • Robert K Bradley & Ian Holmes, 2009. "Evolutionary Triplet Models of Structured RNA," PLOS Computational Biology, Public Library of Science, vol. 5(8), pages 1-20, August.
  • Handle: RePEc:plo:pcbi00:1000483
    DOI: 10.1371/journal.pcbi.1000483
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000483
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000483&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000483?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Elena Rivas & Sean R Eddy, 2008. "Probabilistic Phylogenetic Inference with Insertions and Deletions," PLOS Computational Biology, Public Library of Science, vol. 4(9), pages 1-21, September.
    2. Eric P Nawrocki & Sean R Eddy, 2007. "Query-Dependent Banding (QDB) for Faster RNA Similarity Searches," PLOS Computational Biology, Public Library of Science, vol. 3(3), pages 1-15, March.
    3. Eric A. Gaucher & J. Michael Thomson & Michelle F. Burgan & Steven A. Benner, 2003. "Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins," Nature, Nature, vol. 425(6955), pages 285-288, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anuj Srivastava & Liming Cai & Jan Mrázek & Russell L Malmberg, 2011. "Mutational Patterns in RNA Secondary Structure Evolution Examined in Three RNA Families," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-10, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    2. Paul D Williams & David D Pollock & Benjamin P Blackburne & Richard A Goldstein, 2006. "Assessing the Accuracy of Ancestral Protein Reconstruction Methods," PLOS Computational Biology, Public Library of Science, vol. 2(6), pages 1-8, June.
    3. Zhongyi Lu & Runyue Xia & Siyu Zhang & Jie Pan & Yang Liu & Yuri I. Wolf & Eugene V. Koonin & Meng Li, 2024. "Evolution of optimal growth temperature in Asgard archaea inferred from the temperature dependence of GDP binding to EF-1A," Nature Communications, Nature, vol. 15(1), pages 1-7, December.
    4. Sean R Eddy, 2008. "A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation," PLOS Computational Biology, Public Library of Science, vol. 4(5), pages 1-14, May.
    5. Bin Ma & Louxin Zhang, 2011. "Efficient estimation of the accuracy of the maximum likelihood method for ancestral state reconstruction," Journal of Combinatorial Optimization, Springer, vol. 21(4), pages 409-422, May.
    6. Anuj Srivastava & Liming Cai & Jan Mrázek & Russell L Malmberg, 2011. "Mutational Patterns in RNA Secondary Structure Evolution Examined in Three RNA Families," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-10, June.
    7. Andoni, Alexandr & Daskalakis, Constantinos & Hassidim, Avinatan & Roch, Sebastien, 2012. "Global alignment of molecular sequences via ancestral state reconstruction," Stochastic Processes and their Applications, Elsevier, vol. 122(12), pages 3852-3874.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000483. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.