IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000763.html
   My bibliography  Save this article

Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

Author

Listed:
  • Daniel Ting
  • Guoli Wang
  • Maxim Shapovalov
  • Rajib Mitra
  • Michael I Jordan
  • Roland L Dunbrack Jr

Abstract

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.Author Summary: The three-dimensional structure of a protein enables it to perform its specific function, which may be catalysis, DNA binding, cell signaling, maintaining cell shape and structure, or one of many other functions. Predicting the structures of proteins is an important goal of computational biology. One way of doing this is to figure out the rules that determine protein structure from protein sequences by determining how local protein sequence is associated with local protein structure. That is, many (but not all) of the interactions that determine protein structure occur between amino acids that are a short distance away from each other in the sequence. This is particularly true in the irregular parts of protein structure, often called loops. In this work, we have performed a statistical analysis of the structure of the protein backbone in loops as a function of the protein sequence. We have determined how an amino acid bends the local backbone due to its amino acid type and the amino acid types of its neighbors. We used a recently developed statistical method that is particularly suited to this problem. The analysis shows that backbone conformation prediction can be improved using the information in the statistical distributions we have developed.

Suggested Citation

  • Daniel Ting & Guoli Wang & Maxim Shapovalov & Rajib Mitra & Michael I Jordan & Roland L Dunbrack Jr, 2010. "Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-21, April.
  • Handle: RePEc:plo:pcbi00:1000763
    DOI: 10.1371/journal.pcbi.1000763
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000763
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000763&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000763?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Abramson, Ian S., 1982. "Arbitrariness of the pilot estimator in adaptive kernel methods," Journal of Multivariate Analysis, Elsevier, vol. 12(4), pages 562-567, December.
    2. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    3. Lennox, Kristin P. & Dahl, David B. & Vannucci, Marina & Tsai, Jerry W., 2009. "Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 586-596.
    4. Pertsemlidis Alexander & Zelinka Jan & Fondon John W. & Henderson R. Keith & Otwinowski Zbyszek, 2005. "Bayesian Statistical Studies of the Ramachandran Distribution," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-18, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Armando D Solis, 2014. "Deriving High-Resolution Protein Backbone Structure Propensities from All Crystal Data Using the Information Maximization Device," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-21, June.
    2. Fernández-Durán Juan José & Gregorio-Domínguez MarÍa Mercedes, 2014. "Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 1-18, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Pewsey & Eduardo García-Portugués, 2021. "Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 1-58, March.
    2. Michelle Dietzen & Haoran Zhai & Olivia Lucas & Oriol Pich & Christopher Barrington & Wei-Ting Lu & Sophia Ward & Yanping Guo & Robert E. Hynds & Simone Zaccaria & Charles Swanton & Nicholas McGranaha, 2024. "Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    3. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    4. Hu, Shuowen & Poskitt, D.S. & Zhang, Xibin, 2012. "Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 732-740.
    5. Jin, Xin & Maheu, John M., 2016. "Bayesian semiparametric modeling of realized covariance matrices," Journal of Econometrics, Elsevier, vol. 192(1), pages 19-39.
    6. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    7. Jeffrey L. Furman & Florenta Teodoridis, 2020. "Automation, Research Technology, and Researchers’ Trajectories: Evidence from Computer Science and Electrical Engineering," Organization Science, INFORMS, vol. 31(2), pages 330-354, March.
    8. Xin Jin & John M. Maheu & Qiao Yang, 2019. "Bayesian parametric and semiparametric factor models for large realized covariance matrices," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(5), pages 641-660, August.
    9. Csereklyei, Zsuzsanna & Anantharama, Nandini & Kallies, Anne, 2021. "Electricity market transitions in Australia: Evidence using model-based clustering," Energy Economics, Elsevier, vol. 103(C).
    10. Shu-Ping Shi & Yong Song, 2012. "Identifying Speculative Bubbles with an Infinite Hidden Markov Model," Working Paper series 26_12, Rimini Centre for Economic Analysis.
    11. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    12. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    13. Jin, Xin & Maheu, John M. & Yang, Qiao, 2022. "Infinite Markov pooling of predictive distributions," Journal of Econometrics, Elsevier, vol. 228(2), pages 302-321.
    14. Thomas R. W. Oliver & Lia Chappell & Rashesh Sanghvi & Lauren Deighton & Naser Ansari-Pour & Stefan C. Dentro & Matthew D. Young & Tim H. H. Coorens & Hyunchul Jung & Tim Butler & Matthew D. C. Nevill, 2022. "Clonal diversification and histogenesis of malignant germ cell tumours," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    15. Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
    16. Michael L. Pennell & David B. Dunson, 2008. "Nonparametric Bayes Testing of Changes in a Response Distribution with an Ordinal Predictor," Biometrics, The International Biometric Society, vol. 64(2), pages 413-423, June.
    17. Bruno Scarpa & David B. Dunson, 2009. "Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors," Biometrics, The International Biometric Society, vol. 65(3), pages 772-780, September.
    18. Hassan Akell & Farkhondeh-Alsadat Sajadi & Iraj Kazemi, 2023. "Construction of Jointly Distributed Random Samples Drawn from the Beta Two-Parameter Process," Methodology and Computing in Applied Probability, Springer, vol. 25(3), pages 1-12, September.
    19. Hongxia Yang & Aurelie Lozano, 2015. "Multi-relational learning via hierarchical nonparametric Bayesian collective matrix factorization," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(5), pages 1133-1147, May.
    20. J. Griffin, 2011. "Bayesian clustering of distributions in stochastic frontier analysis," Journal of Productivity Analysis, Springer, vol. 36(3), pages 275-283, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000763. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.