IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-25975-9.html
   My bibliography  Save this article

Mapping the glycosyltransferase fold landscape using interpretable deep learning

Author

Listed:
  • Rahil Taujale

    (Institute of Bioinformatics, University of Georgia
    Complex Carbohydrate Research Center, University of Georgia)

  • Zhongliang Zhou

    (Department of Computer Science, University of Georgia)

  • Wayland Yeung

    (Institute of Bioinformatics, University of Georgia)

  • Kelley W. Moremen

    (Complex Carbohydrate Research Center, University of Georgia
    Biochemistry and Molecular Biology, University of Georgia)

  • Sheng Li

    (Department of Computer Science, University of Georgia)

  • Natarajan Kannan

    (Institute of Bioinformatics, University of Georgia
    Biochemistry and Molecular Biology, University of Georgia)

Abstract

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.

Suggested Citation

  • Rahil Taujale & Zhongliang Zhou & Wayland Yeung & Kelley W. Moremen & Sheng Li & Natarajan Kannan, 2021. "Mapping the glycosyltransferase fold landscape using interpretable deep learning," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25975-9
    DOI: 10.1038/s41467-021-25975-9
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-25975-9
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-25975-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shivesh Kumar & Yan Wang & Ye Zhou & Lucas Dillard & Fay-Wei Li & Carly A. Sciandra & Ning Sui & Rodolfo Zentella & Emily Zahn & Jeffrey Shabanowitz & Donald F. Hunt & Mario J. Borgnia & Alberto Barte, 2023. "Structure and dynamics of the Arabidopsis O-fucosyltransferase SPINDLY," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. Gi Bae Kim & Ji Yeon Kim & Jong An Lee & Charles J. Norsigian & Bernhard O. Palsson & Sang Yup Lee, 2023. "Functional annotation of enzyme-encoding genes using deep learning with transformer layers," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    3. Andrés Manuel González-Ramírez & Ana Sofia Grosso & Zhang Yang & Ismael Compañón & Helena Coelho & Yoshiki Narimatsu & Henrik Clausen & Filipa Marcelo & Francisco Corzana & Ramon Hurtado-Guerrero, 2022. "Structural basis for the synthesis of the core 1 structure by C1GalT1," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-25975-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.