IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008865.html
   My bibliography  Save this article

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

Author

Listed:
  • Yang Li
  • Chengxin Zhang
  • Eric W Bell
  • Wei Zheng
  • Xiaogen Zhou
  • Dong-Jun Yu
  • Yang Zhang

Abstract

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.Author summary: Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.

Suggested Citation

  • Yang Li & Chengxin Zhang & Eric W Bell & Wei Zheng & Xiaogen Zhou & Dong-Jun Yu & Yang Zhang, 2021. "Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-19, March.
  • Handle: RePEc:plo:pcbi00:1008865
    DOI: 10.1371/journal.pcbi.1008865
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008865
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008865&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008865?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sheng Wang & Siqi Sun & Zhen Li & Renyu Zhang & Jinbo Xu, 2017. "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-34, January.
    2. Joe G. Greener & Shaun M. Kandathil & David T. Jones, 2019. "Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints," Nature Communications, Nature, vol. 10(1), pages 1-13, December.
    3. Sean R Eddy, 2011. "Accelerated Profile HMM Searches," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-16, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rahmatullah Roche & Sutanu Bhattacharya & Debswapna Bhattacharya, 2021. "Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-31, February.
    2. Rui Fa & Domenico Cozzetto & Cen Wan & David T Jones, 2018. "Predicting human protein function with multi-task deep neural networks," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-16, June.
    3. Damiano Piovesan & Andras Hatos & Giovanni Minervini & Federica Quaglia & Alexander Miguel Monzon & Silvio C E Tosatto, 2020. "Assessing predictors for new post translational modification sites: A case study on hydroxylation," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-15, June.
    4. Peicong Lin & Yumeng Yan & Huanyu Tao & Sheng-You Huang, 2023. "Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    5. Balázs Szalkai & Ildikó Scheer & Kinga Nagy & Beáta G Vértessy & Vince Grolmusz, 2014. "The Metagenomic Telescope," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-9, July.
    6. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    7. Ngaam J Cheung & Wookyung Yu, 2018. "De novo protein structure prediction using ultra-fast molecular dynamics simulation," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-17, November.
    8. Bilig Sod & Lei Xu & Yajiao Liu & Fei He & Yanchao Xu & Mingna Li & Tianhui Yang & Ting Gao & Junmei Kang & Qingchuan Yang & Ruicai Long, 2023. "Genome-Wide Identification and Expression Analysis of the CesA/Csl Gene Superfamily in Alfalfa ( Medicago sativa L.)," Agriculture, MDPI, vol. 13(9), pages 1-14, August.
    9. Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
    10. Marco Orlando & Patrick C F Buchholz & Marina Lotti & Jürgen Pleiss, 2021. "The GH19 Engineering Database: Sequence diversity, substrate scope, and evolution in glycoside hydrolase family 19," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-30, October.
    11. Ezequiel A Galpern & María I Freiberger & Diego U Ferreiro, 2020. "Large Ankyrin repeat proteins are formed with similar and energetically favorable units," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-16, June.
    12. Shuangxi Ji & Tuğçe Oruç & Liam Mead & Muhammad Fayyaz Rehman & Christopher Morton Thomas & Sam Butterworth & Peter James Winn, 2019. "DeepCDpred: Inter-residue distance and contact prediction for improved prediction of protein structure," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-15, January.
    13. Juan A Morales-Cordovilla & Victoria Sanchez & Martin Ratajczak, 2018. "Protein alignment based on higher order conditional random fields for template-based modeling," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-14, June.
    14. Gerry Q Tonkin-Hill & Leily Trianty & Rintis Noviyanti & Hanh H T Nguyen & Boni F Sebayang & Daniel A Lampah & Jutta Marfurt & Simon A Cobbold & Janavi S Rambhatla & Malcolm J McConville & Stephen J R, 2018. "The Plasmodium falciparum transcriptome in severe malaria reveals altered expression of genes involved in important processes including surface antigen–encoding var genes," PLOS Biology, Public Library of Science, vol. 16(3), pages 1-40, March.
    15. Atul Kumar Upadhyay & Ramanathan Sowdhamini, 2016. "Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-20, July.
    16. Jianzhu Ma & Sheng Wang & Zhiyong Wang & Jinbo Xu, 2014. "MRFalign: Protein Homology Detection through Alignment of Markov Random Fields," PLOS Computational Biology, Public Library of Science, vol. 10(3), pages 1-12, March.
    17. Snehal Dilip Karpe & Vikas Tiwari & Sowdhamini Ramanathan, 2021. "InsectOR—Webserver for sensitive identification of insect olfactory receptor genes from non-model genomes," PLOS ONE, Public Library of Science, vol. 16(1), pages 1-15, January.
    18. Shivangi & Laxman S Meena & Md Amjad Beg, 2018. "Insights of Rv2921c (Ftsy) Gene of Mycobacterium tuberculosis H37Rv To Prove Its Significance by Computational Approach," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 12(2), pages 9147-9157, December.
    19. Amit A Upadhyay & Aaron D Fleetwood & Ogun Adebali & Robert D Finn & Igor B Zhulin, 2016. "Cache Domains That are Homologous to, but Different from PAS Domains Comprise the Largest Superfamily of Extracellular Sensors in Prokaryotes," PLOS Computational Biology, Public Library of Science, vol. 12(4), pages 1-21, April.
    20. Samantha Petti & Sean R Eddy, 2022. "Constructing benchmark test sets for biological sequence analysis using independent set algorithms," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-14, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008865. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.