IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-51511-6.html
   My bibliography  Save this article

Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites

Author

Listed:
  • Xiaorui Wang

    (Zhejiang University
    Macau University of Science and Technology)

  • Xiaodan Yin

    (Zhejiang University
    Macau University of Science and Technology)

  • Dejun Jiang

    (Zhejiang University)

  • Huifeng Zhao

    (Zhejiang University)

  • Zhenxing Wu

    (Zhejiang University)

  • Odin Zhang

    (Zhejiang University)

  • Jike Wang

    (Zhejiang University)

  • Yuquan Li

    (Lanzhou University)

  • Yafeng Deng

    (Ltd)

  • Huanxiang Liu

    (Macao Polytechnic University)

  • Pei Luo

    (Macau University of Science and Technology)

  • Yuqiang Han

    (Chinese University of Hong Kong)

  • Tingjun Hou

    (Zhejiang University)

  • Xiaojun Yao

    (Macao Polytechnic University)

  • Chang-Yu Hsieh

    (Zhejiang University)

Abstract

Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.

Suggested Citation

  • Xiaorui Wang & Xiaodan Yin & Dejun Jiang & Huifeng Zhao & Zhenxing Wu & Odin Zhang & Jike Wang & Yuquan Li & Yafeng Deng & Huanxiang Liu & Pei Luo & Yuqiang Han & Tingjun Hou & Xiaojun Yao & Chang-Yu , 2024. "Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-51511-6
    DOI: 10.1038/s41467-024-51511-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-51511-6
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-51511-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Joseph L. Watson & David Juergens & Nathaniel R. Bennett & Brian L. Trippe & Jason Yim & Helen E. Eisenach & Woody Ahern & Andrew J. Borst & Robert J. Ragotte & Lukas F. Milles & Basile I. M. Wicky & , 2023. "De novo design of protein structure and function with RFdiffusion," Nature, Nature, vol. 620(7976), pages 1089-1100, August.
    2. Daniel Probst & Matteo Manica & Yves Gaetan Nana Teukam & Alessandro Castrogiovanni & Federico Paratore & Teodoro Laino, 2022. "Biocatalysed synthesis planning using data-driven learning," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    3. Alexander Kroll & Sahasra Ranjan & Martin K. M. Engqvist & Martin J. Lercher, 2023. "A general model to predict small molecule substrates of enzymes based on machine and deep learning," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ying Huang & Chenyang Xue & Ruiqian Bu & Cang Wu & Jiachen Li & Jinqiu Zhang & Jinyu Chen & Zhaoying Shi & Yonglong Chen & Yong Wang & Zhongmin Liu, 2024. "Inhibition and transport mechanisms of the ABC transporter hMRP5," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Simeon D. Castle & Michiel Stock & Thomas E. Gorochowski, 2024. "Engineering is evolution: a perspective on design processes to engineer biology," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    3. Aika Iwama & Ryoji Kise & Hiroaki Akasaka & Fumiya K. Sano & Hidetaka S. Oshima & Asuka Inoue & Wataru Shihoya & Osamu Nureki, 2024. "Structure and dynamics of the pyroglutamylated RF-amide peptide QRFP receptor GPR103," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    4. Wei Lu & Jixian Zhang & Weifeng Huang & Ziqiao Zhang & Xiangyu Jia & Zhenyu Wang & Leilei Shi & Chengtao Li & Peter G. Wolynes & Shuangjia Zheng, 2024. "DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    5. Chase R. Freschlin & Sarah A. Fahlberg & Pete Heinzelman & Philip A. Romero, 2024. "Neural network extrapolation to distant regions of the protein fitness landscape," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    6. Itai Levin & Mengjie Liu & Christopher A. Voigt & Connor W. Coley, 2022. "Merging enzymatic and synthetic chemistry with computational synthesis planning," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    7. Enrico Orsi & Lennart Schada von Borzyskowski & Stephan Noack & Pablo I. Nikel & Steffen N. Lindner, 2024. "Automated in vivo enzyme engineering accelerates biocatalyst optimization," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    8. Alexander Kroll & Yvan Rousset & Xiao-Pan Hu & Nina A. Liebrand & Martin J. Lercher, 2023. "Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    9. Lucien F. Krapp & Fernando A. Meireles & Luciano A. Abriata & Jean Devillard & Sarah Vacle & Maria J. Marcaida & Matteo Dal Peraro, 2024. "Context-aware geometric deep learning for protein sequence design," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    10. Simon d’Oelsnitz & Daniel J. Diaz & Wantae Kim & Daniel J. Acosta & Tyler L. Dangerfield & Mason W. Schechter & Matthew B. Minus & James R. Howard & Hannah Do & James M. Loy & Hal S. Alper & Y. Jessie, 2024. "Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    11. Xin Zhang & Zakir Ibrahim & Muhammad Bilawal Khaskheli & Hamad Raza & Fanrui Zhou & Imran Haider Shamsi, 2024. "Integrative Approaches to Abiotic Stress Management in Crops: Combining Bioinformatics Educational Tools and Artificial Intelligence Applications," Sustainability, MDPI, vol. 16(17), pages 1-26, September.
    12. Patrick Bryant & Atharva Kelkar & Andrea Guljas & Cecilia Clementi & Frank Noé, 2024. "Structure prediction of protein-ligand complexes from sequence information with Umol," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    13. Shunshi Kohyama & Béla P. Frohn & Leon Babl & Petra Schwille, 2024. "Machine learning-aided design and screening of an emergent protein function in synthetic cells," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    14. Shuangjia Zheng & Tao Zeng & Chengtao Li & Binghong Chen & Connor W. Coley & Yuedong Yang & Ruibo Wu, 2022. "Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP," Nature Communications, Nature, vol. 13(1), pages 1-9, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-51511-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.