IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006767.html
   My bibliography  Save this article

The intrinsic dimension of protein sequence evolution

Author

Listed:
  • Elena Facco
  • Andrea Pagnani
  • Elena Tea Russo
  • Alessandro Laio

Abstract

It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.Author summary: Protein sequence evolution is an extremely complex process, whose roles are ultimately determined by the necessity of living organisms to adapt to changes in the environment. We here address a fundamental question related with this process: in how many independent directions can a sequence evolve, without compromising the protein capability of folding and of performing its function? We find that the number of these directions is surprisingly small, of 10 or less in most of the families we considered. This property is not correctly accounted for by most of the theoretical model we considered, which predict that sequence evolution can take place in 30-40 independent directions. The only way to accomplish the task of generating low-dimensional sequences is to take into consideration sequence phylogeny.

Suggested Citation

  • Elena Facco & Andrea Pagnani & Elena Tea Russo & Alessandro Laio, 2019. "The intrinsic dimension of protein sequence evolution," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-16, April.
  • Handle: RePEc:plo:pcbi00:1006767
    DOI: 10.1371/journal.pcbi.1006767
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006767
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006767&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006767?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    2. Christoph Feinauer & Marcin J Skwark & Andrea Pagnani & Erik Aurell, 2014. "Improving Contact Prediction along Three Dimensions," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
    3. Lukas Burger & Erik van Nimwegen, 2010. "Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-18, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pengfei Tian & Robert B Best, 2020. "Exploring the sequence fitness landscape of a bridge between protein folds," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-19, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Erik van Nimwegen, 2016. "Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean?," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-10, May.
    2. Juan Carlos Chávez & Felipe J. Fonseca & Manuel Gómez-Zaldívar, 2017. "Resoluciones de disputas comerciales y desempeño económico regional en México. (Commercial Disputes Resolution and Regional Economic Performance in Mexico)," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(1), pages 79-93, May.
    3. Chen, Ray-Bing & Chen, Ying & Härdle, Wolfgang K., 2014. "TVICA—Time varying independent component analysis and its application to financial data," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 95-109.
    4. Yan Yu Chen & Chun-Cheih Chao & Fu-Chen Liu & Po-Chen Hsu & Hsueh-Fen Chen & Shih-Chi Peng & Yung-Jen Chuang & Chung-Yu Lan & Wen-Ping Hsieh & David Shan Hill Wong, 2013. "Dynamic Transcript Profiling of Candida albicans Infection in Zebrafish: A Pathogen-Host Interaction Study," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    5. Plat, Richard, 2009. "Stochastic portfolio specific mortality and the quantification of mortality basis risk," Insurance: Mathematics and Economics, Elsevier, vol. 45(1), pages 123-132, August.
    6. Kondylis, Athanassios & Whittaker, Joe, 2008. "Spectral preconditioning of Krylov spaces: Combining PLS and PC regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2588-2603, January.
    7. Simplice A. Asongu & Nicholas M. Odhiambo, 2019. "Governance, capital flight and industrialisation in Africa," Journal of Economic Structures, Springer;Pan-Pacific Association of Input-Output Studies (PAPAIOS), vol. 8(1), pages 1-22, December.
    8. M. J. Aziakpono & S. Kleimeier & H. Sander, 2012. "Banking market integration in the SADC countries: evidence from interest rate analyses," Applied Economics, Taylor & Francis Journals, vol. 44(29), pages 3857-3876, October.
    9. Bianca Maria Colosimo & Luca Pagani & Marco Grasso, 2024. "Modeling spatial point processes in video-imaging via Ripley’s K-function: an application to spatter analysis in additive manufacturing," Journal of Intelligent Manufacturing, Springer, vol. 35(1), pages 429-447, January.
    10. Ouyang, Yaofu & Li, Peng, 2018. "On the nexus of financial development, economic growth, and energy consumption in China: New perspective from a GMM panel VAR approach," Energy Economics, Elsevier, vol. 71(C), pages 238-252.
    11. Fan, Cheng & Sun, Yongjun & Zhao, Yang & Song, Mengjie & Wang, Jiayuan, 2019. "Deep learning-based feature engineering methods for improved building energy prediction," Applied Energy, Elsevier, vol. 240(C), pages 35-45.
    12. Ionela Munteanu & Adriana Grigorescu & Elena Condrea & Elena Pelinescu, 2020. "Convergent Insights for Sustainable Development and Ethical Cohesion: An Empirical Study on Corporate Governance in Romanian Public Entities," Sustainability, MDPI, vol. 12(7), pages 1-17, April.
    13. Daniel Boss & Annick Hoffmann & Benjamin Rappaz & Christian Depeursinge & Pierre J Magistretti & Dimitri Van de Ville & Pierre Marquet, 2012. "Spatially-Resolved Eigenmode Decomposition of Red Blood Cells Membrane Fluctuations Questions the Role of ATP in Flickering," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-10, August.
    14. Doukas, Haris & Papadopoulou, Alexandra & Savvakis, Nikolaos & Tsoutsos, Theocharis & Psarras, John, 2012. "Assessing energy sustainability of rural communities using Principal Component Analysis," Renewable and Sustainable Energy Reviews, Elsevier, vol. 16(4), pages 1949-1957.
    15. Paschalis Arvanitidis & Athina Economou & Christos Kollias, 2016. "Terrorism’s effects on social capital in European countries," Public Choice, Springer, vol. 169(3), pages 231-250, December.
    16. Rizvi, Syed Kumail Abbas & Rahat, Birjees & Naqvi, Bushra & Umar, Muhammad, 2024. "Revolutionizing finance: The synergy of fintech, digital adoption, and innovation," Technological Forecasting and Social Change, Elsevier, vol. 200(C).
    17. Teerachai Amnuaylojaroen & Pavinee Chanvichit, 2024. "Historical Analysis of the Effects of Drought on Rice and Maize Yields in Southeast Asia," Resources, MDPI, vol. 13(3), pages 1-18, March.
    18. -, 2015. "The effects of climate change on the coasts of Latin America and the Caribbean: Climate variability, dynamics and trends," Documentos de Proyectos 39866, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    19. Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
    20. Weili Duan & Bin He & Daniel Nover & Guishan Yang & Wen Chen & Huifang Meng & Shan Zou & Chuanming Liu, 2016. "Water Quality Assessment and Pollution Source Identification of the Eastern Poyang Lake Basin Using Multivariate Statistical Methods," Sustainability, MDPI, vol. 8(2), pages 1-15, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006767. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.