IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i5p2464-2484.html
   My bibliography  Save this article

ExpertRNA: A New Framework for RNA Secondary Structure Prediction

Author

Listed:
  • Menghan Liu

    (School of Computing Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona 85281)

  • Erik Poppleton

    (School of Molecular Sciences and Center for Molecular Design and Biomimetics, Arizona State University, Tempe, Arizona 85281)

  • Giulia Pedrielli

    (School of Computing Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona 85281)

  • Petr Šulc

    (School of Molecular Sciences and Center for Molecular Design and Biomimetics, Arizona State University, Tempe, Arizona 85281)

  • Dimitri P. Bertsekas

    (School of Computing Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona 85281; Massachusetts Institute of Technology, Electrical Engineering, Cambridge, Massachusetts 02139)

Abstract

Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”—given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms largely rely on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs are also investigated and recently been shown to outperform free energy–based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework that can easily incorporate an arbitrary number of rewards (free energy or nonparametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions of nonpseudoknotted secondary structures than the structure prediction algorithm used, thus validating the promise of the approach. Summary of Contribution: ExpertRNA is a new algorithm inspired by a biological problem. It is applied to solve the problem of secondary structure prediction for RNA molecules given an input sequence. The computational contribution is given by the design of a multibranch, multiexpert rollout algorithm that enables the use of several state-of-the-art approaches as base heuristics and allowing several experts to evaluate partial candidate solutions generated, thus avoiding assuming the reward being optimized by an RNA molecule when folding. Our implementation allows for the effective use of parallel computational resources as well as to control the size of the rollout tree as the algorithm progresses. The problem of RNA secondary structure prediction is of primary importance within the biology field because the molecule structure is strongly related to its functionality. Whereas the contribution of the paper is in the algorithm, the importance of the application makes ExpertRNA a showcase of the relevance of computationally efficient algorithms in supporting scientific discovery.

Suggested Citation

  • Menghan Liu & Erik Poppleton & Giulia Pedrielli & Petr Šulc & Dimitri P. Bertsekas, 2022. "ExpertRNA: A New Framework for RNA Secondary Structure Prediction," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2464-2484, September.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:5:p:2464-2484
    DOI: 10.1287/ijoc.2022.1188
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2022.1188
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2022.1188?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jaswinder Singh & Jack Hanson & Kuldip Paliwal & Yaoqi Zhou, 2019. "RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning," Nature Communications, Nature, vol. 10(1), pages 1-13, December.
    2. Michael F Sloma & David H Mathews, 2017. "Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs," PLOS Computational Biology, Public Library of Science, vol. 13(11), pages 1-23, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peicong Lin & Yumeng Yan & Huanyu Tao & Sheng-You Huang, 2023. "Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    2. Wenkai Wang & Chenjie Feng & Renmin Han & Ziyi Wang & Lisha Ye & Zongyang Du & Hong Wei & Fa Zhang & Zhenling Peng & Jianyi Yang, 2023. "trRosettaRNA: automated prediction of RNA 3D structure with transformer network," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Jiaxing Yang, 2024. "Predicting Distance matrix with large language models," Papers 2409.16333, arXiv.org.
    4. Mark W. Lewis & Amit Verma & Todd T. Eckdahl, 2021. "Qfold: a new modeling paradigm for the RNA folding problem," Journal of Heuristics, Springer, vol. 27(4), pages 695-717, August.
    5. Yang Li & Chengxin Zhang & Chenjie Feng & Robin Pearce & P. Lydia Freddolino & Yang Zhang, 2023. "Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:5:p:2464-2484. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.