Author
Listed:
- Georgios A. Pavlopoulos
(Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming
DOE Joint Genome Institute, Lawrence Berkeley National Laboratory
National and Kapodistrian University of Athens)
- Fotis A. Baltoumas
(Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming)
- Sirui Liu
(Harvard University)
- Oguz Selvitopi
(Lawrence Berkeley National Laboratory)
- Antonio Pedro Camargo
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Stephen Nayfach
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Ariful Azad
(Indiana University Bloomington)
- Simon Roux
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Lee Call
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Natalia N. Ivanova
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- I. Min Chen
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- David Paez-Espino
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Evangelos Karatzas
(Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming)
- Ioannis Iliopoulos
(University of Crete)
- Konstantinos Konstantinidis
(Georgia Institute of Technology)
- James M. Tiedje
(Michigan State University)
- Jennifer Pett-Ridge
(Lawrence Livermore National Laboratory)
- David Baker
(University of Washington
University of Washington
University of Washington)
- Axel Visel
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
- Christos A. Ouzounis
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory
Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas
Aristotle University of Thessalonica)
- Sergey Ovchinnikov
(Harvard University)
- Aydin Buluç
(Lawrence Berkeley National Laboratory
University of California)
- Nikos C. Kyrpides
(DOE Joint Genome Institute, Lawrence Berkeley National Laboratory)
Abstract
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Suggested Citation
Georgios A. Pavlopoulos & Fotis A. Baltoumas & Sirui Liu & Oguz Selvitopi & Antonio Pedro Camargo & Stephen Nayfach & Ariful Azad & Simon Roux & Lee Call & Natalia N. Ivanova & I. Min Chen & David Pae, 2023.
"Unraveling the functional dark matter through global metagenomics,"
Nature, Nature, vol. 622(7983), pages 594-602, October.
Handle:
RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06583-7
DOI: 10.1038/s41586-023-06583-7
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Citations
Citations are extracted by the
CitEc Project, subscribe to its
RSS feed for this item.
Cited by:
- Mindaugas Margelevičius, 2024.
"GTalign: spatial index-driven protein structure alignment, superposition, and search,"
Nature Communications, Nature, vol. 15(1), pages 1-14, December.
- Devlina Chakravarty & Joseph W. Schafer & Ethan A. Chen & Joseph F. Thole & Leslie A. Ronish & Myeongsang Lee & Lauren L. Porter, 2024.
"AlphaFold predictions of fold-switched conformations are driven by structure memorization,"
Nature Communications, Nature, vol. 15(1), pages 1-13, December.
- Georges P. Schmartz & Jacqueline Rehner & Madline P. Gund & Verena Keller & Leidy-Alejandra G. Molano & Stefan Rupf & Matthias Hannig & Tim Berger & Elias Flockerzi & Berthold Seitz & Sara Fleser & Sa, 2024.
"Decoding the diagnostic and therapeutic potential of microbiota using pan-body pan-disease microbiomics,"
Nature Communications, Nature, vol. 15(1), pages 1-13, December.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:622:y:2023:i:7983:d:10.1038_s41586-023-06583-7. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.