IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-53622-6.html
   My bibliography  Save this article

Dirichlet latent modelling enables effective learning and sampling of the functional protein design space

Author

Listed:
  • Evgenii Lobzaev

    (The University of Edinburgh
    The University of Edinburgh)

  • Giovanni Stracquadanio

    (The University of Edinburgh)

Abstract

Engineering proteins with desired functions and biochemical properties is pivotal for biotechnology and drug discovery. While computational methods based on evolutionary information are reducing the experimental burden by designing targeted libraries of functional variants, they still have a low success rate when the desired protein has few or very remote homologous sequences. Here we propose an autoregressive model, called Temporal Dirichlet Variational Autoencoder (TDVAE), which exploits the mathematical properties of the Dirichlet distribution and temporal convolution to efficiently learn high-order information from a functionally related, possibly remotely similar, set of sequences. TDVAE is highly accurate in predicting the effects of amino acid mutations, while being significantly 90% smaller than the other state-of-the-art models. We then use TDVAE to design variants of the human alpha galactosidase enzymes as potential treatment for Fabry disease. Our model builds a library of diverse variants which retain sequence, biochemical and structural properties of the wildtype protein, suggesting they could be suitable for enzyme replacement therapy. Taken together, our results show the importance of accurate sequence modelling and the potential of autoregressive models as protein engineering and analysis tools.

Suggested Citation

  • Evgenii Lobzaev & Giovanni Stracquadanio, 2024. "Dirichlet latent modelling enables effective learning and sampling of the functional protein design space," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53622-6
    DOI: 10.1038/s41467-024-53622-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-53622-6
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-53622-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ivan Anishchenko & Samuel J. Pellock & Tamuka M. Chidyausiku & Theresa A. Ramelot & Sergey Ovchinnikov & Jingzhou Hao & Khushboo Bafna & Christoffer Norn & Alex Kang & Asim K. Bera & Frank DiMaio & La, 2021. "De novo protein design by deep network hallucination," Nature, Nature, vol. 600(7889), pages 547-552, December.
    2. Po-Ssu Huang & Scott E. Boyken & David Baker, 2016. "The coming of age of de novo protein design," Nature, Nature, vol. 537(7620), pages 320-327, September.
    3. Jung-Eun Shin & Adam J. Riesselman & Aaron W. Kollasch & Conor McMahon & Elana Simon & Chris Sander & Aashish Manglik & Andrew C. Kruse & Debora S. Marks, 2021. "Protein design and variant prediction using autoregressive generative models," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thomas W. Linsky & Kyle Noble & Autumn R. Tobin & Rachel Crow & Lauren Carter & Jeffrey L. Urbauer & David Baker & Eva-Maria Strauch, 2022. "Sampling of structure and sequence space of small protein folds," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Fatma-Elzahraa Eid & Albert T. Chen & Ken Y. Chan & Qin Huang & Qingxia Zheng & Isabelle G. Tobey & Simon Pacouret & Pamela P. Brauer & Casey Keyes & Megan Powell & Jencilin Johnston & Binhui Zhao & K, 2024. "Systematic multi-trait AAV capsid engineering for efficient gene delivery," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    3. Namrata Anand & Raphael Eguchi & Irimpan I. Mathews & Carla P. Perez & Alexander Derry & Russ B. Altman & Po-Ssu Huang, 2022. "Protein sequence design with a learned potential," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Karol Buda & Charlotte M. Miton & Nobuhiko Tokuriki, 2023. "Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    5. Mireia Seuma & Ben Lehner & Benedetta Bolognesi, 2022. "An atlas of amyloid aggregation: the impact of substitutions, insertions, deletions and truncations on amyloid beta fibril nucleation," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    6. Jordan Yang & Nandita Naik & Jagdish Suresh Patel & Christopher S Wylie & Wenze Gu & Jessie Huang & F Marty Ytreberg & Mandar T Naik & Daniel M Weinreich & Brenda M Rubenstein, 2020. "Predicting the viability of beta-lactamase: How folding and binding free energies correlate with beta-lactamase fitness," PLOS ONE, Public Library of Science, vol. 15(5), pages 1-26, May.
    7. Agnese I. Curatolo & Ofer Kimchi & Carl P. Goodrich & Ryan K. Krueger & Michael P. Brenner, 2023. "A computational toolbox for the assembly yield of complex and heterogeneous structures," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    8. Biao Ruan & Yanan He & Yingwei Chen & Eun Jung Choi & Yihong Chen & Dana Motabar & Tsega Solomon & Richard Simmerman & Thomas Kauffman & D. Travis Gallagher & John Orban & Philip N. Bryan, 2023. "Design and characterization of a protein fold switching network," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    9. Noelia Ferruz & Steffen Schmidt & Birte Höcker, 2022. "ProtGPT2 is a deep unsupervised language model for protein design," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    10. Pengfei Tian & Robert B Best, 2020. "Exploring the sequence fitness landscape of a bridge between protein folds," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-19, October.
    11. Betz, Ulrich A.K. & Arora, Loukik & Assal, Reem A. & Azevedo, Hatylas & Baldwin, Jeremy & Becker, Michael S. & Bostock, Stefan & Cheng, Vinton & Egle, Tobias & Ferrari, Nicola & Schneider-Futschik, El, 2023. "Game changers in science and technology - now and beyond," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    12. Nicki Skafte Detlefsen & Søren Hauberg & Wouter Boomsma, 2022. "Learning meaningful representations of protein sequences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    13. Kevin E. Wu & Kevin K. Yang & Rianne Berg & Sarah Alamdari & James Y. Zou & Alex X. Lu & Ava P. Amini, 2024. "Protein structure generation via folding diffusion," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    14. Nathaniel R. Bennett & Brian Coventry & Inna Goreshnik & Buwei Huang & Aza Allen & Dionne Vafeados & Ying Po Peng & Justas Dauparas & Minkyung Baek & Lance Stewart & Frank DiMaio & Steven Munck & Savv, 2023. "Improving de novo protein binder design with deep learning," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    15. Julia Skokowa & Birte Hernandez Alvarez & Murray Coles & Malte Ritter & Masoud Nasri & Jérémy Haaf & Narges Aghaallaei & Yun Xu & Perihan Mir & Ann-Christin Krahl & Katherine W. Rogers & Kateryna Maks, 2022. "A topological refactoring design strategy yields highly stable granulopoietic proteins," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    16. Emily K. Makowski & Patrick C. Kinnunen & Jie Huang & Lina Wu & Matthew D. Smith & Tiexin Wang & Alec A. Desai & Craig N. Streu & Yulei Zhang & Jennifer M. Zupancic & John S. Schardt & Jennifer J. Lin, 2022. "Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    17. Anton Kocheturov & Panos M. Pardalos & Athanasia Karakitsiou, 2019. "Massive datasets and machine learning for computational biomedicine: trends and challenges," Annals of Operations Research, Springer, vol. 276(1), pages 5-34, May.
    18. Tamuka M. Chidyausiku & Soraia R. Mendes & Jason C. Klima & Marta Nadal & Ulrich Eckhard & Jorge Roel-Touris & Scott Houliston & Tibisay Guevara & Hugh K. Haddox & Adam Moyer & Cheryl H. Arrowsmith & , 2022. "De novo design of immunoglobulin-like domains," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    19. Md Tauhidul Islam & Zixia Zhou & Hongyi Ren & Masoud Badiei Khuzani & Daniel Kapp & James Zou & Lu Tian & Joseph C. Liao & Lei Xing, 2023. "Revealing hidden patterns in deep neural network feature space continuum via manifold learning," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    20. Haohuai He & Bing He & Lei Guan & Yu Zhao & Feng Jiang & Guanxing Chen & Qingge Zhu & Calvin Yu-Chian Chen & Ting Li & Jianhua Yao, 2024. "De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model," Nature Communications, Nature, vol. 15(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53622-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.