IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v600y2021i7889d10.1038_s41586-021-04184-w.html
   My bibliography  Save this article

De novo protein design by deep network hallucination

Author

Listed:
  • Ivan Anishchenko

    (University of Washington
    University of Washington)

  • Samuel J. Pellock

    (University of Washington
    University of Washington)

  • Tamuka M. Chidyausiku

    (University of Washington
    University of Washington)

  • Theresa A. Ramelot

    (Rensselaer Polytechnic Institute
    Rensselaer Polytechnic Institute)

  • Sergey Ovchinnikov

    (Harvard University)

  • Jingzhou Hao

    (Rensselaer Polytechnic Institute
    Rensselaer Polytechnic Institute)

  • Khushboo Bafna

    (Rensselaer Polytechnic Institute
    Rensselaer Polytechnic Institute)

  • Christoffer Norn

    (University of Washington
    University of Washington)

  • Alex Kang

    (University of Washington
    University of Washington)

  • Asim K. Bera

    (University of Washington
    University of Washington)

  • Frank DiMaio

    (University of Washington
    University of Washington)

  • Lauren Carter

    (University of Washington
    University of Washington)

  • Cameron M. Chow

    (University of Washington
    University of Washington)

  • Gaetano T. Montelione

    (Rensselaer Polytechnic Institute
    Rensselaer Polytechnic Institute)

  • David Baker

    (University of Washington
    University of Washington
    University of Washington)

Abstract

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1–3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue–residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback–Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-‘hallucinated’ sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.

Suggested Citation

  • Ivan Anishchenko & Samuel J. Pellock & Tamuka M. Chidyausiku & Theresa A. Ramelot & Sergey Ovchinnikov & Jingzhou Hao & Khushboo Bafna & Christoffer Norn & Alex Kang & Asim K. Bera & Frank DiMaio & La, 2021. "De novo protein design by deep network hallucination," Nature, Nature, vol. 600(7889), pages 547-552, December.
  • Handle: RePEc:nat:nature:v:600:y:2021:i:7889:d:10.1038_s41586-021-04184-w
    DOI: 10.1038/s41586-021-04184-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-021-04184-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-021-04184-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Namrata Anand & Raphael Eguchi & Irimpan I. Mathews & Carla P. Perez & Alexander Derry & Russ B. Altman & Po-Ssu Huang, 2022. "Protein sequence design with a learned potential," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Betz, Ulrich A.K. & Arora, Loukik & Assal, Reem A. & Azevedo, Hatylas & Baldwin, Jeremy & Becker, Michael S. & Bostock, Stefan & Cheng, Vinton & Egle, Tobias & Ferrari, Nicola & Schneider-Futschik, El, 2023. "Game changers in science and technology - now and beyond," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    3. Thomas W. Linsky & Kyle Noble & Autumn R. Tobin & Rachel Crow & Lauren Carter & Jeffrey L. Urbauer & David Baker & Eva-Maria Strauch, 2022. "Sampling of structure and sequence space of small protein folds," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Jan Zrimec & Xiaozhi Fu & Azam Sheikh Muhammad & Christos Skrekas & Vykintas Jauniskis & Nora K. Speicher & Christoph S. Börlin & Vilhelm Verendel & Morteza Haghir Chehreghani & Devdatt Dubhashi & Ver, 2022. "Controlling gene expression with deep generative design of regulatory DNA," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    5. Tamuka M. Chidyausiku & Soraia R. Mendes & Jason C. Klima & Marta Nadal & Ulrich Eckhard & Jorge Roel-Touris & Scott Houliston & Tibisay Guevara & Hugh K. Haddox & Adam Moyer & Cheryl H. Arrowsmith & , 2022. "De novo design of immunoglobulin-like domains," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    6. Md Tauhidul Islam & Zixia Zhou & Hongyi Ren & Masoud Badiei Khuzani & Daniel Kapp & James Zou & Lu Tian & Joseph C. Liao & Lei Xing, 2023. "Revealing hidden patterns in deep neural network feature space continuum via manifold learning," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    7. Nathaniel R. Bennett & Brian Coventry & Inna Goreshnik & Buwei Huang & Aza Allen & Dionne Vafeados & Ying Po Peng & Justas Dauparas & Minkyung Baek & Lance Stewart & Frank DiMaio & Steven Munck & Savv, 2023. "Improving de novo protein binder design with deep learning," Nature Communications, Nature, vol. 14(1), pages 1-9, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:600:y:2021:i:7889:d:10.1038_s41586-021-04184-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.