IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-25976-8.html
   My bibliography  Save this article

ECNet is an evolutionary context-integrated deep learning framework for protein engineering

Author

Listed:
  • Yunan Luo

    (University of Illinois at Urbana-Champaign)

  • Guangde Jiang

    (University of Illinois at Urbana-Champaign)

  • Tianhao Yu

    (University of Illinois at Urbana-Champaign)

  • Yang Liu

    (University of Illinois at Urbana-Champaign)

  • Lam Vo

    (University of Illinois at Urbana-Champaign)

  • Hantian Ding

    (University of Illinois at Urbana-Champaign)

  • Yufeng Su

    (University of Illinois at Urbana-Champaign)

  • Wesley Wei Qian

    (University of Illinois at Urbana-Champaign)

  • Huimin Zhao

    (University of Illinois at Urbana-Champaign)

  • Jian Peng

    (University of Illinois at Urbana-Champaign)

Abstract

Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.

Suggested Citation

  • Yunan Luo & Guangde Jiang & Tianhao Yu & Yang Liu & Lam Vo & Hantian Ding & Yufeng Su & Wesley Wei Qian & Huimin Zhao & Jian Peng, 2021. "ECNet is an evolutionary context-integrated deep learning framework for protein engineering," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25976-8
    DOI: 10.1038/s41467-021-25976-8
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-25976-8
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-25976-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ziyi Zhou & Liang Zhang & Yuanxi Yu & Banghao Wu & Mingchen Li & Liang Hong & Pan Tan, 2024. "Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    2. Kerr Ding & Michael Chin & Yunlong Zhao & Wei Huang & Binh Khanh Mai & Huanan Wang & Peng Liu & Yang Yang & Yunan Luo, 2024. "Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    3. Yinghui Chen & Yunxin Xu & Di Liu & Yaoguang Xing & Haipeng Gong, 2024. "An end-to-end framework for the prediction of protein structure and fitness from single sequence," Nature Communications, Nature, vol. 15(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25976-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.