IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0235153.html
   My bibliography  Save this article

A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy

Author

Listed:
  • Sheng-Hung Juan
  • Teng-Ruei Chen
  • Wei-Cheng Lo

Abstract

The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR.

Suggested Citation

  • Sheng-Hung Juan & Teng-Ruei Chen & Wei-Cheng Lo, 2020. "A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-26, June.
  • Handle: RePEc:plo:pone00:0235153
    DOI: 10.1371/journal.pone.0235153
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0235153
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0235153&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0235153?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jiangning Song & Hao Tan & Andrew J Perry & Tatsuya Akutsu & Geoffrey I Webb & James C Whisstock & Robert N Pike, 2012. "PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-23, November.
    2. Jianzhao Gao & Eshel Faraggi & Yaoqi Zhou & Jishou Ruan & Lukasz Kurgan, 2012. "BEST: Improved Prediction of B-Cell Epitopes from Antigen Sequences," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    3. Chia-Han Chu & Wei-Cheng Lo & Hsin-Wei Wang & Yen-Chu Hsu & Jenn-Kang Hwang & Ping-Chiang Lyu & Tun-Wen Pai & Chuan Yi Tang, 2010. "Detection and Alignment of 3D Domain Swapping Proteins Using Angle-Distance Image-Based Secondary Structural Matching Techniques," PLOS ONE, Public Library of Science, vol. 5(10), pages 1-22, October.
    4. Wei-Cheng Lo & Tian Dai & Yen-Yi Liu & Li-Fen Wang & Jenn-Kang Hwang & Ping-Chiang Lyu, 2012. "Deciphering the Preference and Predicting the Viability of Circular Permutations in Proteins," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-20, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sonu Kumar & Boris I Ratnikov & Marat D Kazanov & Jeffrey W Smith & Piotr Cieplak, 2015. "CleavPredict: A Platform for Reasoning about Matrix Metalloproteinases Proteolytic Events," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-19, May.
    2. Huilin Wang & Mingjun Wang & Hao Tan & Yuan Li & Ziding Zhang & Jiangning Song, 2014. "PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-17, August.
    3. Julian E Fuchs & Susanne von Grafenstein & Roland G Huber & Christian Kramer & Klaus R Liedl, 2013. "Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-15, November.
    4. Jianzhao Gao & Wei Cui & Yajun Sheng & Jishou Ruan & Lukasz Kurgan, 2016. "PSIONplus: Accurate Sequence-Based Predictor of Ion Channels and Their Types," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-18, April.
    5. Fahad M. Aldakheel, 2021. "Allergic Diseases: A Comprehensive Review on Risk Factors, Immunological Mechanisms, Link with COVID-19, Potential Treatments, and Role of Allergen Bioinformatics," IJERPH, MDPI, vol. 18(22), pages 1-29, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0235153. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.