IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v620y2023i7973d10.1038_s41586-023-06328-6.html
   My bibliography  Save this article

Mega-scale experimental analysis of protein folding stability in biology and design

Author

Listed:
  • Kotaro Tsuboyama

    (Northwestern University Feinberg School of Medicine
    Northwestern University
    PRESTO, Japan Science and Technology Agency
    The University of Tokyo)

  • Justas Dauparas

    (University of Washington
    University of Washington)

  • Jonathan Chen

    (Northwestern University Feinberg School of Medicine
    Northwestern University
    Northwestern University)

  • Elodie Laine

    (Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238)

  • Yasser Mohseni Behbahani

    (Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238)

  • Jonathan J. Weinstein

    (Weizmann Institute of Science)

  • Niall M. Mangan

    (Northwestern University
    Northwestern University)

  • Sergey Ovchinnikov

    (Harvard University)

  • Gabriel J. Rocklin

    (Northwestern University Feinberg School of Medicine
    Northwestern University)

Abstract

Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5–7 and guide protein engineering8–10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.

Suggested Citation

  • Kotaro Tsuboyama & Justas Dauparas & Jonathan Chen & Elodie Laine & Yasser Mohseni Behbahani & Jonathan J. Weinstein & Niall M. Mangan & Sergey Ovchinnikov & Gabriel J. Rocklin, 2023. "Mega-scale experimental analysis of protein folding stability in biology and design," Nature, Nature, vol. 620(7973), pages 434-444, August.
  • Handle: RePEc:nat:nature:v:620:y:2023:i:7973:d:10.1038_s41586-023-06328-6
    DOI: 10.1038/s41586-023-06328-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-023-06328-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-023-06328-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Martin Grønbæk-Thygesen & Vasileios Voutsinos & Kristoffer E. Johansson & Thea K. Schulze & Matteo Cagiada & Line Pedersen & Lene Clausen & Snehal Nariya & Rachel L. Powell & Amelie Stein & Douglas M., 2024. "Deep mutational scanning reveals a correlation between degradation and toxicity of thousands of aspartoacylase variants," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    2. Lene Clausen & Vasileios Voutsinos & Matteo Cagiada & Kristoffer E. Johansson & Martin Grønbæk-Thygesen & Snehal Nariya & Rachel L. Powell & Magnus K. N. Have & Vibe H. Oestergaard & Amelie Stein & Do, 2024. "A mutational atlas for Parkin proteostasis," Nature Communications, Nature, vol. 15(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:620:y:2023:i:7973:d:10.1038_s41586-023-06328-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.