IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v622y2023i7983d10.1038_s41586-023-06583-7.html
   My bibliography  Save this article

Unraveling the functional dark matter through global metagenomics

Author

Listed:
  • Georgios A. Pavlopoulos

    (Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming
    DOE Joint Genome Institute, Lawrence Berkeley National Laboratory
    National and Kapodistrian University of Athens)

  • Fotis A. Baltoumas

    (Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming)

  • Sirui Liu

    (Harvard University)

  • Oguz Selvitopi

    (Lawrence Berkeley National Laboratory)

  • Antonio Pedro Camargo

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Stephen Nayfach

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Ariful Azad

    (Indiana University Bloomington)

  • Simon Roux

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Lee Call

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Natalia N. Ivanova

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • I. Min Chen

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • David Paez-Espino

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Evangelos Karatzas

    (Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming)

  • Ioannis Iliopoulos

    (University of Crete)

  • Konstantinos Konstantinidis

    (Georgia Institute of Technology)

  • James M. Tiedje

    (Michigan State University)

  • Jennifer Pett-Ridge

    (Lawrence Livermore National Laboratory)

  • David Baker

    (University of Washington
    University of Washington
    University of Washington)

  • Axel Visel

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

  • Christos A. Ouzounis

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory
    Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas
    Aristotle University of Thessalonica)

  • Sergey Ovchinnikov

    (Harvard University)

  • Aydin Buluç

    (Lawrence Berkeley National Laboratory
    University of California)

  • Nikos C. Kyrpides

    (DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)

Abstract

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.

Suggested Citation

  • Georgios A. Pavlopoulos & Fotis A. Baltoumas & Sirui Liu & Oguz Selvitopi & Antonio Pedro Camargo & Stephen Nayfach & Ariful Azad & Simon Roux & Lee Call & Natalia N. Ivanova & I. Min Chen & David Pae, 2023. "Unraveling the functional dark matter through global metagenomics," Nature, Nature, vol. 622(7983), pages 594-602, October.
  • Handle: RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06583-7
    DOI: 10.1038/s41586-023-06583-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-023-06583-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-023-06583-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mindaugas Margelevičius, 2024. "GTalign: spatial index-driven protein structure alignment, superposition, and search," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Devlina Chakravarty & Joseph W. Schafer & Ethan A. Chen & Joseph F. Thole & Leslie A. Ronish & Myeongsang Lee & Lauren L. Porter, 2024. "AlphaFold predictions of fold-switched conformations are driven by structure memorization," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06583-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.