IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-33397-4.html
   My bibliography  Save this article

Deciphering microbial gene function using natural language processing

Author

Listed:
  • Danielle Miller

    (Tel-Aviv University)

  • Adi Stern

    (Tel-Aviv University)

  • David Burstein

    (Tel-Aviv University)

Abstract

Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model “gene semantics” based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the “discovery potential” of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method’s ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes.

Suggested Citation

  • Danielle Miller & Adi Stern & David Burstein, 2022. "Deciphering microbial gene function using natural language processing," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33397-4
    DOI: 10.1038/s41467-022-33397-4
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-33397-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-33397-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Florian Tesson & Alexandre Hervé & Ernest Mordret & Marie Touchon & Camille d’Humières & Jean Cury & Aude Bernheim, 2022. "Systematic and quantitative view of the antiviral arsenal of prokaryotes," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    2. David Burstein & Lucas B. Harrington & Steven C. Strutt & Alexander J. Probst & Karthik Anantharaman & Brian C. Thomas & Jennifer A. Doudna & Jillian F. Banfield, 2017. "New CRISPR–Cas systems from uncultivated microbes," Nature, Nature, vol. 542(7640), pages 237-241, February.
    3. Andrew C. Pawlowski & Wenliang Wang & Kalinka Koteva & Hazel A. Barton & Andrew G. McArthur & Gerard D. Wright, 2016. "A diverse intrinsic antibiotic resistome from a cave bacterium," Nature Communications, Nature, vol. 7(1), pages 1-10, December.
    4. Chaya M. Fridman & Kinga Keppel & Motti Gerlic & Eran Bosis & Dor Salomon, 2020. "A comparative genomics methodology reveals a widespread family of membrane-disrupting T6SS effectors," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sheri Harari & Danielle Miller & Shay Fleishon & David Burstein & Adi Stern, 2024. "Using big sequencing data to identify chronic SARS-Coronavirus-2 infections," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    2. Yunha Hwang & Andre L. Cornman & Elizabeth H. Kellogg & Sergey Ovchinnikov & Peter R. Girguis, 2024. "Genomic language model predicts protein co-regulation and function," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Feiyu Zhao & Tao Zhang & Xiaodi Sun & Xiyun Zhang & Letong Chen & Hejun Wang & Jinze Li & Peng Fan & Liangxue Lai & Tingting Sui & Zhanjun Li, 2023. "A strategy for Cas13 miniaturization based on the structure and AlphaFold," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. Dimitri Boeckaerts & Michiel Stock & Celia Ferriol-González & Jesús Oteo-Iglesias & Rafael Sanjuán & Pilar Domingo-Calap & Bernard Baets & Yves Briers, 2024. "Prediction of Klebsiella phage-host specificity at the strain level," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    3. Rubén Barcia-Cruz & David Goudenège & Jorge A. Moura de Sousa & Damien Piel & Martial Marbouty & Eduardo P. C. Rocha & Frédérique Roux, 2024. "Phage-inducible chromosomal minimalist islands (PICMIs), a novel family of small marine satellites of virulent phages," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    4. Boris Kantor & Bernadette O’Donovan & Joseph Rittiner & Dellila Hodgson & Nicholas Lindner & Sophia Guerrero & Wendy Dong & Austin Zhang & Ornit Chiba-Falek, 2024. "The therapeutic implications of all-in-one AAV-delivered epigenome-editing platform in neurodegenerative disorders," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    5. Pedro Leão & Mary E. Little & Kathryn E. Appler & Daphne Sahaya & Emily Aguilar-Pine & Kathryn Currie & Ilya J. Finkelstein & Valerie Anda & Brett J. Baker, 2024. "Asgard archaea defense systems and their roles in the origin of eukaryotic immunity," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    6. Lingchen He & Laura Miguel-Romero & Jonasz B. Patkowski & Nasser Alqurainy & Eduardo P. C. Rocha & Tiago R. D. Costa & Alfred Fillol-Salom & José R. Penadés, 2024. "Tail assembly interference is a common strategy in bacterial antiviral defenses," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    7. Camila G. C. Lemes & Isabella F. Cordeiro & Camila H. de Paula & Ana K. Silva & Flávio F. do Carmo & Luciana H. Y. Kamino & Flávia M. S. Carvalho & Juan C. Caicedo & Jesus A. Ferro & Leandro M. Moreir, 2021. "Potential Bioinoculants for Sustainable Agriculture Prospected from Ferruginous Caves of the Iron Quadrangle/Brazil," Sustainability, MDPI, vol. 13(16), pages 1-23, August.
    8. Tom J. Arrowsmith & Xibing Xu & Shangze Xu & Ben Usher & Peter Stokes & Megan Guest & Agnieszka K. Bronowska & Pierre Genevaux & Tim R. Blower, 2024. "Inducible auto-phosphorylation regulates a widespread family of nucleotidyltransferase toxins," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    9. Carolien Bastiaanssen & Pilar Bobadilla Ugarte & Kijun Kim & Giada Finocchio & Yanlei Feng & Todd A. Anzelon & Stephan Köstlbacher & Daniel Tamarit & Thijs J. G. Ettema & Martin Jinek & Ian J. MacRae , 2024. "RNA-guided RNA silencing by an Asgard archaeal Argonaute," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    10. Angelina Beavogui & Auriane Lacroix & Nicolas Wiart & Julie Poulain & Tom O. Delmont & Lucas Paoli & Patrick Wincker & Pedro H. Oliveira, 2024. "The defensome of complex bacterial communities," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    11. Daniela S. Aliaga Goltsman & Lisa M. Alexander & Jyun-Liang Lin & Rodrigo Fregoso Ocampo & Benjamin Freeman & Rebecca C. Lamothe & Andres Perez Rivas & Morayma M. Temoche-Diaz & Shailaja Chadha & Nata, 2022. "Compact Cas9d and HEARO enzymes for genome editing discovered from uncultivated microbes," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    12. Shao-Ming Gao & Han-Lan Fei & Qi Li & Li-Ying Lan & Li-Nan Huang & Peng-Fei Fan, 2024. "Eco-evolutionary dynamics of gut phageome in wild gibbons (Hoolock tianxing) with seasonal diet variations," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    13. Jan D. Brüwer & Chandni Sidhu & Yanlin Zhao & Andreas Eich & Leonard Rößler & Luis H. Orellana & Bernhard M. Fuchs, 2024. "Globally occurring pelagiphage infections create ribosome-deprived cells," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    14. Natasha K. Dudek & Jesus G. Galaz-Montoya & Handuo Shi & Megan Mayer & Cristina Danita & Arianna I. Celis & Tobias Viehboeck & Gong-Her Wu & Barry Behr & Silvia Bulgheresi & Kerwyn Casey Huang & Wah C, 2023. "Previously uncharacterized rectangular bacterial structures in the dolphin mouth," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    15. Motaher Hossain & Barbaros Aslan & Asma Hatoum-Aslan, 2024. "Tandem mobilization of anti-phage defenses alongside SCCmec elements in staphylococci," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    16. Changchang Xin & Jianhang Yin & Shaopeng Yuan & Liqiong Ou & Mengzhu Liu & Weiwei Zhang & Jiazhi Hu, 2022. "Comprehensive assessment of miniature CRISPR-Cas12f nucleases for gene disruption," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    17. Xiaoguang Pan & Kunli Qu & Hao Yuan & Xi Xiang & Christian Anthon & Liubov Pashkova & Xue Liang & Peng Han & Giulia I. Corsi & Fengping Xu & Ping Liu & Jiayan Zhong & Yan Zhou & Tao Ma & Hui Jiang & J, 2022. "Massively targeted evaluation of therapeutic CRISPR off-targets in cells," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    18. Matthieu Haudiquet & Julie Bris & Amandine Nucci & Rémy A. Bonnin & Pilar Domingo-Calap & Eduardo P. C. Rocha & Olaya Rendueles, 2024. "Capsules and their traits shape phage susceptibility and plasmid conjugation efficiency," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    19. Bogna J. Smug & Krzysztof Szczepaniak & Eduardo P. C. Rocha & Stanislaw Dunin-Horkawicz & Rafał J. Mostowy, 2023. "Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    20. Yong Sheng & Hengyu Wang & Yixin Ou & Yingying Wu & Wei Ding & Meifeng Tao & Shuangjun Lin & Zixin Deng & Linquan Bai & Qianjin Kang, 2023. "Insertion sequence transposition inactivates CRISPR-Cas immunity," Nature Communications, Nature, vol. 14(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33397-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.