IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-33397-4.html
   My bibliography  Save this article

Deciphering microbial gene function using natural language processing

Author

Listed:
  • Danielle Miller

    (Tel-Aviv University)

  • Adi Stern

    (Tel-Aviv University)

  • David Burstein

    (Tel-Aviv University)

Abstract

Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model “gene semantics” based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the “discovery potential” of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method’s ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes.

Suggested Citation

  • Danielle Miller & Adi Stern & David Burstein, 2022. "Deciphering microbial gene function using natural language processing," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33397-4
    DOI: 10.1038/s41467-022-33397-4
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-33397-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-33397-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David Burstein & Lucas B. Harrington & Steven C. Strutt & Alexander J. Probst & Karthik Anantharaman & Brian C. Thomas & Jennifer A. Doudna & Jillian F. Banfield, 2017. "New CRISPR–Cas systems from uncultivated microbes," Nature, Nature, vol. 542(7640), pages 237-241, February.
    2. Chaya M. Fridman & Kinga Keppel & Motti Gerlic & Eran Bosis & Dor Salomon, 2020. "A comparative genomics methodology reveals a widespread family of membrane-disrupting T6SS effectors," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    3. Florian Tesson & Alexandre Hervé & Ernest Mordret & Marie Touchon & Camille d’Humières & Jean Cury & Aude Bernheim, 2022. "Systematic and quantitative view of the antiviral arsenal of prokaryotes," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    4. Andrew C. Pawlowski & Wenliang Wang & Kalinka Koteva & Hazel A. Barton & Andrew G. McArthur & Gerard D. Wright, 2016. "A diverse intrinsic antibiotic resistome from a cave bacterium," Nature Communications, Nature, vol. 7(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sheri Harari & Danielle Miller & Shay Fleishon & David Burstein & Adi Stern, 2024. "Using big sequencing data to identify chronic SARS-Coronavirus-2 infections," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    2. Yunha Hwang & Andre L. Cornman & Elizabeth H. Kellogg & Sergey Ovchinnikov & Peter R. Girguis, 2024. "Genomic language model predicts protein co-regulation and function," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    3. Alena Drobiazko & Myfanwy C. Adams & Mikhail Skutel & Kristina Potekhina & Oksana Kotovskaya & Anna Trofimova & Mikhail Matlashov & Daria Yatselenko & Karen L. Maxwell & Tim R. Blower & Konstantin Sev, 2025. "Molecular basis of foreign DNA recognition by BREX anti-phage immunity system," Nature Communications, Nature, vol. 16(1), pages 1-21, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitri Boeckaerts & Michiel Stock & Celia Ferriol-González & Jesús Oteo-Iglesias & Rafael Sanjuán & Pilar Domingo-Calap & Bernard Baets & Yves Briers, 2024. "Prediction of Klebsiella phage-host specificity at the strain level," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    2. Boris Kantor & Bernadette O’Donovan & Joseph Rittiner & Dellila Hodgson & Nicholas Lindner & Sophia Guerrero & Wendy Dong & Austin Zhang & Ornit Chiba-Falek, 2024. "The therapeutic implications of all-in-one AAV-delivered epigenome-editing platform in neurodegenerative disorders," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Pedro Leão & Mary E. Little & Kathryn E. Appler & Daphne Sahaya & Emily Aguilar-Pine & Kathryn Currie & Ilya J. Finkelstein & Valerie Anda & Brett J. Baker, 2024. "Asgard archaea defense systems and their roles in the origin of eukaryotic immunity," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    4. Lingchen He & Laura Miguel-Romero & Jonasz B. Patkowski & Nasser Alqurainy & Eduardo P. C. Rocha & Tiago R. D. Costa & Alfred Fillol-Salom & José R. Penadés, 2024. "Tail assembly interference is a common strategy in bacterial antiviral defenses," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    5. Camila G. C. Lemes & Isabella F. Cordeiro & Camila H. de Paula & Ana K. Silva & Flávio F. do Carmo & Luciana H. Y. Kamino & Flávia M. S. Carvalho & Juan C. Caicedo & Jesus A. Ferro & Leandro M. Moreir, 2021. "Potential Bioinoculants for Sustainable Agriculture Prospected from Ferruginous Caves of the Iron Quadrangle/Brazil," Sustainability, MDPI, vol. 13(16), pages 1-23, August.
    6. Tom J. Arrowsmith & Xibing Xu & Shangze Xu & Ben Usher & Peter Stokes & Megan Guest & Agnieszka K. Bronowska & Pierre Genevaux & Tim R. Blower, 2024. "Inducible auto-phosphorylation regulates a widespread family of nucleotidyltransferase toxins," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    7. Carolien Bastiaanssen & Pilar Bobadilla Ugarte & Kijun Kim & Giada Finocchio & Yanlei Feng & Todd A. Anzelon & Stephan Köstlbacher & Daniel Tamarit & Thijs J. G. Ettema & Martin Jinek & Ian J. MacRae , 2024. "RNA-guided RNA silencing by an Asgard archaeal Argonaute," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    8. Angelina Beavogui & Auriane Lacroix & Nicolas Wiart & Julie Poulain & Tom O. Delmont & Lucas Paoli & Patrick Wincker & Pedro H. Oliveira, 2024. "The defensome of complex bacterial communities," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    9. Jan D. Brüwer & Chandni Sidhu & Yanlin Zhao & Andreas Eich & Leonard Rößler & Luis H. Orellana & Bernhard M. Fuchs, 2024. "Globally occurring pelagiphage infections create ribosome-deprived cells," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    10. Ester M Eckert & Andrea Di Cesare & Diego Fontaneto & Thomas U Berendonk & Helmut Bürgmann & Eddie Cytryn & Despo Fatta-Kassinos & Andrea Franzetti & D G Joakim Larsson & Célia M Manaia & Amy Pruden &, 2020. "Every fifth published metagenome is not available to science," PLOS Biology, Public Library of Science, vol. 18(4), pages 1-7, April.
    11. Motaher Hossain & Barbaros Aslan & Asma Hatoum-Aslan, 2024. "Tandem mobilization of anti-phage defenses alongside SCCmec elements in staphylococci," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    12. Aa Haeruman Azam & Kohei Kondo & Kotaro Chihara & Tomohiro Nakamura & Shinjiro Ojima & Wenhan Nie & Azumi Tamura & Wakana Yamashita & Yo Sugawara & Motoyuki Sugai & Longzhu Cui & Yoshimasa Takahashi &, 2024. "Evasion of antiviral bacterial immunity by phage tRNAs," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    13. Bogna J. Smug & Krzysztof Szczepaniak & Eduardo P. C. Rocha & Stanislaw Dunin-Horkawicz & Rafał J. Mostowy, 2023. "Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    14. Natalia Quinones-Olvera & Siân V. Owen & Lucy M. McCully & Maximillian G. Marin & Eleanor A. Rand & Alice C. Fan & Oluremi J. Martins Dosumu & Kay Paul & Cleotilde E. Sanchez Castaño & Rachel Petherbr, 2024. "Diverse and abundant phages exploit conjugative plasmids," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    15. Zongzhi Wu & Tang Liu & Qian Chen & Tianyi Chen & Jinyun Hu & Liyu Sun & Bingxue Wang & Wenpeng Li & Jinren Ni, 2024. "Unveiling the unknown viral world in groundwater," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Feiyu Zhao & Tao Zhang & Xiaodi Sun & Xiyun Zhang & Letong Chen & Hejun Wang & Jinze Li & Peng Fan & Liangxue Lai & Tingting Sui & Zhanjun Li, 2023. "A strategy for Cas13 miniaturization based on the structure and AlphaFold," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    17. Rubén Barcia-Cruz & David Goudenège & Jorge A. Moura de Sousa & Damien Piel & Martial Marbouty & Eduardo P. C. Rocha & Frédérique Roux, 2024. "Phage-inducible chromosomal minimalist islands (PICMIs), a novel family of small marine satellites of virulent phages," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    18. Alena Drobiazko & Myfanwy C. Adams & Mikhail Skutel & Kristina Potekhina & Oksana Kotovskaya & Anna Trofimova & Mikhail Matlashov & Daria Yatselenko & Karen L. Maxwell & Tim R. Blower & Konstantin Sev, 2025. "Molecular basis of foreign DNA recognition by BREX anti-phage immunity system," Nature Communications, Nature, vol. 16(1), pages 1-21, December.
    19. Wenhui Li & Xianyue Jiang & Wuke Wang & Liya Hou & Runze Cai & Yongqian Li & Qiuxi Gu & Qinchang Chen & Peixiang Ma & Jin Tang & Menghao Guo & Guohui Chuai & Xingxu Huang & Jun Zhang & Qi Liu, 2024. "Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    20. Daniela S. Aliaga Goltsman & Lisa M. Alexander & Jyun-Liang Lin & Rodrigo Fregoso Ocampo & Benjamin Freeman & Rebecca C. Lamothe & Andres Perez Rivas & Morayma M. Temoche-Diaz & Shailaja Chadha & Nata, 2022. "Compact Cas9d and HEARO enzymes for genome editing discovered from uncultivated microbes," Nature Communications, Nature, vol. 13(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33397-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.