IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v622y2023i7983d10.1038_s41586-023-06622-3.html
   My bibliography  Save this article

Uncovering new families and folds in the natural protein universe

Author

Listed:
  • Janani Durairaj

    (Biozentrum, University of Basel
    University of Basel)

  • Andrew M. Waterhouse

    (Biozentrum, University of Basel
    University of Basel)

  • Toomas Mets

    (University of Tartu
    Lund University)

  • Tetiana Brodiazhenko

    (University of Tartu)

  • Minhal Abdullah

    (University of Tartu
    Lund University)

  • Gabriel Studer

    (Biozentrum, University of Basel
    University of Basel)

  • Gerardo Tauriello

    (Biozentrum, University of Basel
    University of Basel)

  • Mehmet Akdel

    (VantAI)

  • Antonina Andreeva

    (European Bioinformatics Institute (EMBL-EBI))

  • Alex Bateman

    (European Bioinformatics Institute (EMBL-EBI))

  • Tanel Tenson

    (University of Tartu)

  • Vasili Hauryliuk

    (University of Tartu
    Lund University
    Science for Life Laboratory
    Lund University)

  • Torsten Schwede

    (Biozentrum, University of Basel
    University of Basel)

  • Joana Pereira

    (Biozentrum, University of Basel
    University of Basel)

Abstract

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.

Suggested Citation

  • Janani Durairaj & Andrew M. Waterhouse & Toomas Mets & Tetiana Brodiazhenko & Minhal Abdullah & Gabriel Studer & Gerardo Tauriello & Mehmet Akdel & Antonina Andreeva & Alex Bateman & Tanel Tenson & Va, 2023. "Uncovering new families and folds in the natural protein universe," Nature, Nature, vol. 622(7983), pages 646-653, October.
  • Handle: RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06622-3
    DOI: 10.1038/s41586-023-06622-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-023-06622-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-023-06622-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mindaugas Margelevičius, 2024. "GTalign: spatial index-driven protein structure alignment, superposition, and search," Nature Communications, Nature, vol. 15(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06622-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.