IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004889.html
   My bibliography  Save this article

Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models

Author

Listed:
  • Hugo Jacquin
  • Amy Gilson
  • Eugene Shakhnovich
  • Simona Cocco
  • Rémi Monasson

Abstract

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of ‘true’ LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.Author Summary: Inverse statistical approaches, modeling pairwise correlations between amino acids in the sequences of homologous proteins across many different organisms, can successfully extract protein structure (contact) information. Here, we benchmark those statistical approaches on exactly solvable models of proteins, folding on a 3D lattice, to assess the reasons underlying their success and their limitations. We show that the inferred parameters (effective pairwise interactions) of the statistical models have clear and quantitative interpretations in terms of positive (favoring the native fold) and negative (disfavoring competing folds) protein sequence design. New sequences randomly drawn from the statistical models are likely to fold into the native structures when effective pairwise interactions are accurately inferred, a performance which cannot be achieved with independent-site models.

Suggested Citation

  • Hugo Jacquin & Amy Gilson & Eugene Shakhnovich & Simona Cocco & Rémi Monasson, 2016. "Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-18, May.
  • Handle: RePEc:plo:pcbi00:1004889
    DOI: 10.1371/journal.pcbi.1004889
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004889
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004889&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004889?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michael Socolich & Steve W. Lockless & William P. Russ & Heather Lee & Kevin H. Gardner & Rama Ranganathan, 2005. "Evolutionary information for specifying a protein fold," Nature, Nature, vol. 437(7058), pages 512-518, September.
    2. William P. Russ & Drew M. Lowery & Prashant Mishra & Michael B. Yaffe & Rama Ranganathan, 2005. "Natural-like function in artificial WW domains," Nature, Nature, vol. 437(7058), pages 579-583, September.
    3. Lukas Burger & Erik van Nimwegen, 2010. "Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-18, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cheyenne Ziegler & Jonathan Martin & Claude Sinner & Faruck Morcos, 2023. "Latent generative landscapes as maps of functional diversity in protein sequence space," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yasser Roudi & Sheila Nirenberg & Peter E Latham, 2009. "Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can't," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-18, May.
    2. Erik van Nimwegen, 2016. "Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean?," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-10, May.
    3. Xu, Xiu-Lian & Shi, Jin-Xuan & Wang, Jun & Li, Wenfei, 2021. "Long-range correlation and critical fluctuations in coevolution networks of protein sequences," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    4. Tiberiu Teşileanu & Lucy J Colwell & Stanislas Leibler, 2015. "Protein Sectors: Statistical Coupling Analysis versus Conservation," PLOS Computational Biology, Public Library of Science, vol. 11(2), pages 1-20, February.
    5. Andrea Procaccini & Bryan Lunt & Hendrik Szurmant & Terence Hwa & Martin Weigt, 2011. "Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-9, May.
    6. Susann Vorberg & Stefan Seemayer & Johannes Söding, 2018. "Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-25, November.
    7. Shou-Wen Wang & Anne-Florence Bitbol & Ned S Wingreen, 2019. "Revealing evolutionary constraints on proteins through sequence analysis," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
    8. Simona Cocco & Remi Monasson & Martin Weigt, 2013. "From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-17, August.
    9. Saeed Omidi & Mihaela Zavolan & Mikhail Pachkov & Jeremie Breda & Severin Berger & Erik van Nimwegen, 2017. "Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors," PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-22, July.
    10. Carlo Baldassi & Marco Zamparo & Christoph Feinauer & Andrea Procaccini & Riccardo Zecchina & Martin Weigt & Andrea Pagnani, 2014. "Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-12, March.
    11. Yan Zeng & Wei Wang & Yong Ding & Jilin Zhang & Yongjian Ren & Guangzheng Yi, 2022. "Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG," Mathematics, MDPI, vol. 10(24), pages 1-21, December.
    12. Jennifer L Lahti & Adam P Silverman & Jennifer R Cochran, 2009. "Interrogating and Predicting Tolerated Sequence Diversity in Protein Folds: Application to E. elaterium Trypsin Inhibitor-II Cystine-Knot Miniprotein," PLOS Computational Biology, Public Library of Science, vol. 5(9), pages 1-15, September.
    13. Tatjana Braun & Julia Koehler Leman & Oliver F Lange, 2015. "Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-20, December.
    14. Elena Facco & Andrea Pagnani & Elena Tea Russo & Alessandro Laio, 2019. "The intrinsic dimension of protein sequence evolution," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
    15. Francisco McGee & Sandro Hauri & Quentin Novinger & Slobodan Vucetic & Ronald M. Levy & Vincenzo Carnevale & Allan Haldane, 2021. "The generative capacity of probabilistic protein sequence models," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    16. Marcin J Skwark & Daniele Raimondi & Mirco Michel & Arne Elofsson, 2014. "Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns," PLOS Computational Biology, Public Library of Science, vol. 10(11), pages 1-14, November.
    17. Shunshi Kohyama & Béla P. Frohn & Leon Babl & Petra Schwille, 2024. "Machine learning-aided design and screening of an emergent protein function in synthetic cells," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Umberto Lupo & Damiano Sgarbossa & Anne-Florence Bitbol, 2022. "Protein language models trained on multiple sequence alignments learn phylogenetic relationships," Nature Communications, Nature, vol. 13(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004889. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.