IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003847.html
   My bibliography  Save this article

Improving Contact Prediction along Three Dimensions

Author

Listed:
  • Christoph Feinauer
  • Marcin J Skwark
  • Andrea Pagnani
  • Erik Aurell

Abstract

Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose a predictive model to describe a sequence alignment; (iii) infer the model parameters and interpret them in terms of structural properties, such as an accurate contact map. We show here that all three dimensions are important for overall prediction success. In particular, we show that it is possible to improve significantly along the second dimension by going beyond the pair-wise Potts models from statistical physics, which have hitherto been the focus of the field. These (simple) extensions are motivated by multiple sequence alignments often containing long stretches of gaps which, as a data feature, would be rather untypical for independent samples drawn from a Potts model. Using a large test set of proteins we show that the combined improvements along the three dimensions are as large as any reported to date.Author Summary: Proteins are large molecules that living cells make by stringing together building blocks called amino acids or peptides, following their blue-prints in the DNA. Freshly made proteins are typically long, structure-less chains of peptides, but shortly afterwards most of them fold into characteristic structures. Proteins execute many functions in the cell, for which they need to have the right structure, which is therefore very important in determining what the proteins can do. The structure of a protein can be determined by X-ray diffraction and other experimental approaches which are all, to this day, somewhat labor-intensive and difficult. On the other hand, the order of the peptides in a protein can be read off from the DNA blue-print, and such protein sequences are today routinely produced in large numbers. In this paper we show that many similar protein sequences can be used to find information about the structure. The basic approach is to construct a probabilistic model for sequence variability, and then to use the parameters of that model to predict structure in three-dimensional space. The main technical novelty compared to previous contributions in the same general direction is that we use models more directly matched to the data.

Suggested Citation

  • Christoph Feinauer & Marcin J Skwark & Andrea Pagnani & Erik Aurell, 2014. "Improving Contact Prediction along Three Dimensions," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
  • Handle: RePEc:plo:pcbi00:1003847
    DOI: 10.1371/journal.pcbi.1003847
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003847
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003847&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003847?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simona Cocco & Remi Monasson & Martin Weigt, 2013. "From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-17, August.
    2. Andrea Procaccini & Bryan Lunt & Hendrik Szurmant & Terence Hwa & Martin Weigt, 2011. "Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-9, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Elena Facco & Andrea Pagnani & Elena Tea Russo & Alessandro Laio, 2019. "The intrinsic dimension of protein sequence evolution," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
    2. Lorenzo Asti & Guido Uguzzoni & Paolo Marcatili & Andrea Pagnani, 2016. "Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity," PLOS Computational Biology, Public Library of Science, vol. 12(4), pages 1-20, April.
    3. Pedro L Teixeira & Jeff L Mendenhall & Sten Heinze & Brian Weiner & Marcin J Skwark & Jens Meiler, 2017. "Membrane protein contact and structure prediction using co-evolution in conjunction with machine learning," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-24, May.
    4. Erik Aurell, 2016. "The Maximum Entropy Fallacy Redux?," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-7, May.
    5. Erik van Nimwegen, 2016. "Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean?," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-10, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrick Bryant & Gabriele Pozzati & Arne Elofsson, 2022. "Improved prediction of protein-protein interactions using AlphaFold2," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Shou-Wen Wang & Anne-Florence Bitbol & Ned S Wingreen, 2019. "Revealing evolutionary constraints on proteins through sequence analysis," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
    3. Omar Haq & Michael Andrec & Alexandre V Morozov & Ronald M Levy, 2012. "Correlated Electrostatic Mutations Provide a Reservoir of Stability in HIV Protease," PLOS Computational Biology, Public Library of Science, vol. 8(9), pages 1-10, September.
    4. Ross D. Jones & Yili Qian & Katherine Ilia & Benjamin Wang & Michael T. Laub & Domitilla Del Vecchio & Ron Weiss, 2022. "Robust and tunable signal processing in mammalian cells via engineered covalent modification cycles," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    5. Swetha Garimalla & Thomas Kieber-Emmons & Anastas D Pashov, 2015. "The Patterns of Coevolution in Clade B HIV Envelope's N-Glycosylation Sites," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-18, June.
    6. Cocco, S. & Monasson, R. & Posani, L. & Rosay, S. & Tubiana, J., 2018. "Statistical physics and representations in real and artificial neural networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 504(C), pages 45-76.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003847. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.