IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003176.html
   My bibliography  Save this article

From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

Author

Listed:
  • Simona Cocco
  • Remi Monasson
  • Martin Weigt

Abstract

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.Author Summary: Extracting functional and structural information about protein families from the covariation of residues in multiple sequence alignments is an important challenge in computational biology. Here we propose a statistical-physics inspired framework to analyze those covariations, which naturally unifies existing methods in the literature. Our approach allows us to identify statistically relevant ‘patterns’ of residues, specific to a protein family. We show that many patterns correspond to a small number of sites on the protein sequence, in close contact on the 3D fold. Hence, those patterns allow us to make accurate predictions about the contact map from sequence data only. Further more, we show that the dimensional reduction, which is achieved by considering only the statistically most significant patterns, avoids overfitting in small sequence alignments, and improves our capacity of extracting residue contacts in this case.

Suggested Citation

  • Simona Cocco & Remi Monasson & Martin Weigt, 2013. "From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-17, August.
  • Handle: RePEc:plo:pcbi00:1003176
    DOI: 10.1371/journal.pcbi.1003176
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003176
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003176&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003176?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Elad Schneidman & Michael J. Berry & Ronen Segev & William Bialek, 2006. "Weak pairwise correlations imply strongly correlated network states in a neural population," Nature, Nature, vol. 440(7087), pages 1007-1012, April.
    2. Lukas Burger & Erik van Nimwegen, 2010. "Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-18, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cocco, S. & Monasson, R. & Posani, L. & Rosay, S. & Tubiana, J., 2018. "Statistical physics and representations in real and artificial neural networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 504(C), pages 45-76.
    2. Swetha Garimalla & Thomas Kieber-Emmons & Anastas D Pashov, 2015. "The Patterns of Coevolution in Clade B HIV Envelope's N-Glycosylation Sites," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-18, June.
    3. Shou-Wen Wang & Anne-Florence Bitbol & Ned S Wingreen, 2019. "Revealing evolutionary constraints on proteins through sequence analysis," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
    4. Rakhi Kumari & Pradeep Bhadola & Nivedita Deo, 2024. "Statistical analysis of proteins families: a network and random matrix approach," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 97(10), pages 1-15, October.
    5. Christoph Feinauer & Marcin J Skwark & Andrea Pagnani & Erik Aurell, 2014. "Improving Contact Prediction along Three Dimensions," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrea Procaccini & Bryan Lunt & Hendrik Szurmant & Terence Hwa & Martin Weigt, 2011. "Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-9, May.
    2. Lipovetsky, Stan, 2018. "Quantum paradigm of probability amplitude and complex utility in entangled discrete choice modeling," Journal of choice modelling, Elsevier, vol. 27(C), pages 62-73.
    3. Mark L Ioffe & Michael J Berry II, 2017. "The structured ‘low temperature’ phase of the retinal population code," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-31, October.
    4. Katarína Bod’ová & Enikő Szép & Nicholas H Barton, 2021. "Dynamic maximum entropy provides accurate approximation of structured population dynamics," PLOS Computational Biology, Public Library of Science, vol. 17(12), pages 1-22, December.
    5. MohammadReza Zahedian & Mahsa Bagherikalhor & Andrey Trufanov & G. Reza Jafari, 2022. "Financial Crisis in the Framework of Non-zero Temperature Balance Theory," Papers 2202.03198, arXiv.org.
    6. Gaëlle Desbordes & Jianzhong Jin & Chong Weng & Nicholas A Lesica & Garrett B Stanley & Jose-Manuel Alonso, 2008. "Timing Precision in Population Coding of Natural Scenes in the Early Visual System," PLOS Biology, Public Library of Science, vol. 6(12), pages 1-11, December.
    7. Yasser Roudi & Sheila Nirenberg & Peter E Latham, 2009. "Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can't," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-18, May.
    8. Maulana, Ardian & Situngkir, Hokky, 2015. "Korelasi Bebas-skala dalam Studi Geo-politik Pemilihan [Scale-free correlation within Geopolitics of Election Studies]," MPRA Paper 66351, University Library of Munich, Germany.
    9. Zhang, Qi & Deng, Ronghao & Ding, Kaixing & Li, Meizhu, 2024. "Structural analysis and the sum of nodes’ betweenness centrality in complex networks," Chaos, Solitons & Fractals, Elsevier, vol. 185(C).
    10. Hideaki Shimazaki & Shun-ichi Amari & Emery N Brown & Sonja Grün, 2012. "State-Space Analysis of Time-Varying Higher-Order Spike Correlation for Multiple Neural Spike Train Data," PLOS Computational Biology, Public Library of Science, vol. 8(3), pages 1-27, March.
    11. Timothy R Lezon & Ivet Bahar, 2010. "Using Entropy Maximization to Understand the Determinants of Structural Dynamics beyond Native Contact Topology," PLOS Computational Biology, Public Library of Science, vol. 6(6), pages 1-12, June.
    12. Xiaoyuan Liu & Hayato Ushijima-Mwesigwa & Avradip Mandal & Sarvagya Upadhyay & Ilya Safro & Arnab Roy, 2022. "Leveraging special-purpose hardware for local search heuristics," Computational Optimization and Applications, Springer, vol. 82(1), pages 1-29, May.
    13. Susann Vorberg & Stefan Seemayer & Johannes Söding, 2018. "Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-25, November.
    14. Sacha Jennifer van Albada & Moritz Helias & Markus Diesmann, 2015. "Scalability of Asynchronous Networks Is Limited by One-to-One Mapping between Effective Connectivity and Correlations," PLOS Computational Biology, Public Library of Science, vol. 11(9), pages 1-37, September.
    15. Sahar Gelfman & Quanli Wang & Yi-Fan Lu & Diana Hall & Christopher D Bostick & Ryan Dhindsa & Matt Halvorsen & K Melodi McSweeney & Ellese Cotterill & Tom Edinburgh & Michael A Beaumont & Wayne N Fran, 2018. "meaRtools: An R package for the analysis of neuronal networks recorded on microelectrode arrays," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-20, October.
    16. Jason S Prentice & Olivier Marre & Mark L Ioffe & Adrianna R Loback & Gašper Tkačik & Michael J Berry II, 2016. "Error-Robust Modes of the Retinal Population Code," PLOS Computational Biology, Public Library of Science, vol. 12(11), pages 1-32, November.
    17. Montani, Fernando & Phoka, Elena & Portesi, Mariela & Schultz, Simon R., 2013. "Statistical modelling of higher-order correlations in pools of neural activity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3066-3086.
    18. Jan Humplik & Gašper Tkačik, 2017. "Probabilistic models for neural populations that naturally capture global coupling and criticality," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-26, September.
    19. Richard R Stein & Debora S Marks & Chris Sander, 2015. "Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models," PLOS Computational Biology, Public Library of Science, vol. 11(7), pages 1-22, July.
    20. Ross S Williamson & Maneesh Sahani & Jonathan W Pillow, 2015. "The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-31, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003176. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.