IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-43010-x.html
   My bibliography  Save this article

Accurate de novo peptide sequencing using fully convolutional neural networks

Author

Listed:
  • Kaiyuan Liu

    (Indiana University)

  • Yuzhen Ye

    (Indiana University)

  • Sujun Li

    (Indiana University
    Dengding BioAI Co., Ltd.)

  • Haixu Tang

    (Indiana University)

Abstract

De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we present PepNet, a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data.

Suggested Citation

  • Kaiyuan Liu & Yuzhen Ye & Sujun Li & Haixu Tang, 2023. "Accurate de novo peptide sequencing using fully convolutional neural networks," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43010-x
    DOI: 10.1038/s41467-023-43010-x
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-43010-x
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-43010-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sangtae Kim & Pavel A. Pevzner, 2014. "MS-GF+ makes progress towards a universal database search tool for proteomics," Nature Communications, Nature, vol. 5(1), pages 1-10, December.
    2. Chen, Gong-meng & Firth, Michael & Rui, Oliver M, 2001. "The Dynamic Relation between Stock Returns, Trading Volume, and Volatility," The Financial Review, Eastern Finance Association, vol. 36(3), pages 153-173, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Melih Yilmaz & William E. Fondrie & Wout Bittremieux & Carlo F. Melendez & Rowan Nelson & Varun Ananth & Sewoong Oh & William Stafford Noble, 2024. "Sequence-to-sequence translation from mass spectra to peptides with a transformer model," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pankaj Sinha & Shalini Agnihotri, 2016. "Investigating Impact of Volatility Persistence and Information Inflow on Volatility of Stock Indices Using Bivarite GJR-GARCH," Global Business Review, International Management Institute, vol. 17(5), pages 1145-1161, October.
    2. Tazhikul Mashirova & Karlygash Tastanbekova & Murat Nurgabylov & Gulnar Lukhmanova & Kundyz Myrzabekkyzy, 2023. "Analysis of the Relationship between the Highest Price and the Trading Volume of the Energy Company Shares in Kazakhstan with Frequency Domain Causality Method," International Journal of Energy Economics and Policy, Econjournals, vol. 13(4), pages 22-27, July.
    3. Kausik Chaudhuri & Alok Kumar, 2015. "A Markov-Switching Model for Indian Stock Price and Volume," Journal of Emerging Market Finance, Institute for Financial Management and Research, vol. 14(3), pages 239-257, December.
    4. Hervé, Fabrice & Zouaoui, Mohamed & Belvaux, Bertrand, 2019. "Noise traders and smart money: Evidence from online searches," Economic Modelling, Elsevier, vol. 83(C), pages 141-149.
    5. Yao, Yi & Yang, Rong & Liu, Zhiyuan & Hasan, Iftekhar, 2013. "Government intervention and institutional trading strategy: Evidence from a transition country," Global Finance Journal, Elsevier, vol. 24(1), pages 44-68.
    6. Alizadeh, Amir H. & Tamvakis, Michael, 2016. "Market conditions, trader types and price–volume relation in energy futures markets," Energy Economics, Elsevier, vol. 56(C), pages 134-149.
    7. Wang, Danxia, 2024. "Beyond active share: Boosting fund performance through common holdings with same-benchmark mutual funds," International Review of Financial Analysis, Elsevier, vol. 92(C).
    8. Sun, Changyou, 2013. "Price variation and volume dynamics of securitized timberlands," Forest Policy and Economics, Elsevier, vol. 27(C), pages 44-53.
    9. Farag, Hisham & Cressy, Robert, 2011. "Do regulatory policies affect the flow of information in emerging markets?," Research in International Business and Finance, Elsevier, vol. 25(3), pages 238-254, September.
    10. Nikolaos Antonakakis & Ioannis Chatziantoniou & David Gabauer, 2021. "A regional decomposition of US housing prices and volume: market dynamics and Portfolio diversification," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 66(2), pages 279-307, April.
    11. Kyritsis Konstantinos & Sotiropoulos Ioannis & Gogos Christos & Kypriotelis Efstratios, 2007. "Note On The Effect Of The 11 Years Global Climate Cycle On The Prices Of The Capital Markets," Post-Print hal-01552349, HAL.
    12. Saswat Patra & Malay Bhattacharyya, 2021. "Does volume really matter? A risk management perspective using cross‐country evidence," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 26(1), pages 118-135, January.
    13. Marcus Alexander Ong, 2015. "An information theoretic analysis of stock returns, volatility and trading volumes," Applied Economics, Taylor & Francis Journals, vol. 47(36), pages 3891-3906, August.
    14. Saif Siddiqui & Preeti Roy, 2019. "Asymmetric relationship between implied volatility, index returns and trading volume: an application of quantile regression model," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 46(3), pages 239-252, September.
    15. Qiang Chen & Daolun Chen & YuTing Gong, 2012. "An empirical analysis of dynamic relationship between stock market and bond market based on information shocks," China Finance Review International, Emerald Group Publishing Limited, vol. 2(3), pages 265-285, June.
    16. Phan, Dinh Hoang Bach & Sharma, Susan Sunila & Narayan, Paresh Kumar, 2015. "Stock return forecasting: Some new evidence," International Review of Financial Analysis, Elsevier, vol. 40(C), pages 38-51.
    17. Pengfei Wang & Wei Zhang & Xiao Li & Dehua Shen, 2019. "Trading volume and return volatility of Bitcoin market: evidence for the sequential information arrival hypothesis," Journal of Economic Interaction and Coordination, Springer;Society for Economic Science with Heterogeneous Interacting Agents, vol. 14(2), pages 377-418, June.
    18. Todorova, Neda & Souček, Michael, 2014. "The impact of trading volume, number of trades and overnight returns on forecasting the daily realized range," Economic Modelling, Elsevier, vol. 36(C), pages 332-340.
    19. Zheng Fang & Hongqiang Qin & Jiawei Mao & Zhongyu Wang & Na Zhang & Yan Wang & Luyao Liu & Yongzhan Nie & Mingming Dong & Mingliang Ye, 2022. "Glyco-Decipher enables glycan database-independent peptide matching and in-depth characterization of site-specific N-glycosylation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    20. Francesco Guidi, 2009. "Volatility and Long-Term Relations in Equity Markets: Empirical Evidence from Germany, Switzerland, and the UK," The IUP Journal of Financial Economics, IUP Publications, vol. 0(2), pages 7-39, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43010-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.