IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000054.html
   My bibliography  Save this article

Predicting Co-Complexed Protein Pairs from Heterogeneous Data

Author

Listed:
  • Jian Qiu
  • William Stafford Noble

Abstract

Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC50 of 0.937. Assuming a negative-to-positive ratio of 600∶1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.Author Summary: Many proteins perform their jobs as part of multi-protein units called complexes, and several technologies exist to identify these complexes and their components with varying precision and throughput. In this work, we describe and apply a computational framework for combining a variety of experimental data to identify pairs of yeast proteins that partipicate in a complex—so-called co-complexed protein pairs (CCPPs). The method uses machine learning to generalize from well-characterized CCPPs, making predictions of novel CCPPs on the basis of sequence similarity, tandem affinity mass spectrometry data, yeast two-hybrid data, genetic interactions, microarray expression data, ChIP-chip assays, and colocalization by fluorescence microscopy. The resulting model accurately summarizes this heterogeneous body of data: in a cross-validated test, the model achieves an estimated coverage of 89% at a false discovery rate of 10%. The final collection of predicted CCPPs is available as a public resource. These predictions, as well as the general methodology described here, provide a valuable summary of diverse yeast interaction data and generate quantitative, testable hypotheses about novel CCPPs.

Suggested Citation

  • Jian Qiu & William Stafford Noble, 2008. "Predicting Co-Complexed Protein Pairs from Heterogeneous Data," PLOS Computational Biology, Public Library of Science, vol. 4(4), pages 1-10, April.
  • Handle: RePEc:plo:pcbi00:1000054
    DOI: 10.1371/journal.pcbi.1000054
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000054
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000054&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000054?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Won-Ki Huh & James V. Falvo & Luke C. Gerke & Adam S. Carroll & Russell W. Howson & Jonathan S. Weissman & Erin K. O'Shea, 2003. "Global analysis of protein localization in budding yeast," Nature, Nature, vol. 425(6959), pages 686-691, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Oliver M Crook & Aikaterini Geladaki & Daniel J H Nightingale & Owen L Vennard & Kathryn S Lilley & Laurent Gatto & Paul D W Kirk, 2020. "A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-21, November.
    2. Julia P. Schessner & Vincent Albrecht & Alexandra K. Davies & Pavel Sinitcyn & Georg H. H. Borner, 2023. "Deep and fast label-free Dynamic Organellar Mapping," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    3. Arthur Fischbach & Angela Johns & Kara L. Schneider & Xinxin Hao & Peter Tessarz & Thomas Nyström, 2023. "Artificial Hsp104-mediated systems for re-localizing protein aggregates," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    4. Louis-François Handfield & Yolanda T Chong & Jibril Simmons & Brenda J Andrews & Alan M Moses, 2013. "Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    5. Maya Dinur-Mills & Merav Tal & Ophry Pines, 2008. "Dual Targeted Mitochondrial Proteins Are Characterized by Lower MTS Parameters and Total Net Charge," PLOS ONE, Public Library of Science, vol. 3(5), pages 1-8, May.
    6. Md. Abdulla Al Mamun & Wei Cao & Shugo Nakamura & Jun-ichi Maruyama, 2023. "Large-scale identification of genes involved in septal pore plugging in multicellular fungi," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    7. Verena Kohler & Andreas Kohler & Lisa Larsson Berglund & Xinxin Hao & Sarah Gersing & Axel Imhof & Thomas Nyström & Johanna L. Höög & Martin Ott & Claes Andréasson & Sabrina Büttner, 2024. "Nuclear Hsp104 safeguards the dormant translation machinery during quiescence," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    8. Nebojsa Jukic & Alma P. Perrino & Frédéric Humbert & Aurélien Roux & Simon Scheuring, 2022. "Snf7 spirals sense and alter membrane curvature," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    9. Jian Cui & Jinghua Liu & Yuhua Li & Tieliu Shi, 2011. "Integrative Identification of Arabidopsis Mitochondrial Proteome and Its Function Exploitation through Protein Interaction Network," PLOS ONE, Public Library of Science, vol. 6(1), pages 1-16, January.
    10. Xiaomei Wu & Erli Pang & Kui Lin & Zhen-Ming Pei, 2013. "Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-11, May.
    11. Kiyan Shabestary & Cinzia Klemm & Benedict Carling & James Marshall & Juline Savigny & Marko Storch & Rodrigo Ledesma-Amaro, 2024. "Phenotypic heterogeneity follows a growth-viability tradeoff in response to amino acid identity," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    12. Michelle Lindström & Lihua Chen & Shan Jiang & Dan Zhang & Yuan Gao & Ju Zheng & Xinxin Hao & Xiaoxue Yang & Arpitha Kabbinale & Johannes Thoma & Lisa C. Metzger & Deyuan Y. Zhang & Xuefeng Zhu & Huis, 2022. "Lsm7 phase-separated condensates trigger stress granule formation," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    13. Joke J F A van Vugt & Martijn de Jager & Magdalena Murawska & Alexander Brehm & John van Noort & Colin Logie, 2009. "Multiple Aspects of ATP-Dependent Nucleosome Translocation by RSC and Mi-2 Are Directed by the Underlying DNA Sequence," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-14, July.
    14. Stefan A. Hoffmann & Yizhi Cai, 2024. "Engineering stringent genetic biocontainment of yeast with a protein stability switch," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    15. Alex N Nguyen Ba & Bob Strome & Jun Jie Hua & Jonathan Desmond & Isabelle Gagnon-Arsenault & Eric L Weiss & Christian R Landry & Alan M Moses, 2014. "Detecting Functional Divergence after Gene Duplication through Evolutionary Changes in Posttranslational Regulatory Sequences," PLOS Computational Biology, Public Library of Science, vol. 10(12), pages 1-15, December.
    16. Yosuke Ito & Yuhei Chadani & Tatsuya Niwa & Ayako Yamakawa & Kodai Machida & Hiroaki Imataka & Hideki Taguchi, 2022. "Nascent peptide-induced translation discontinuation in eukaryotes impacts biased amino acid usage in proteomes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    17. Rory M Donovan & Jose-Juan Tapia & Devin P Sullivan & James R Faeder & Robert F Murphy & Markus Dittrich & Daniel M Zuckerman, 2016. "Unbiased Rare Event Sampling in Spatial Stochastic Systems Biology Models Using a Weighted Ensemble of Trajectories," PLOS Computational Biology, Public Library of Science, vol. 12(2), pages 1-25, February.
    18. Sunny Sharma & Jun Yang & Ewa Grudzien-Nogalska & Jessica Shivas & Kelvin Y. Kwan & Megerditch Kiledjian, 2022. "Xrn1 is a deNADding enzyme modulating mitochondrial NAD-capped RNA," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    19. Nicola M. Moloney & Konstantin Barylyuk & Eelco Tromer & Oliver M. Crook & Lisa M. Breckels & Kathryn S. Lilley & Ross F. Waller & Paula MacGregor, 2023. "Mapping diversity in African trypanosomes using high resolution spatial proteomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    20. Chung-Chi Liao & Yi-Sen Wang & Wen-Chieh Pi & Chun-Hsiung Wang & Yi-Min Wu & Wei-Yi Chen & Kuo-Chiang Hsia, 2023. "Structural convergence endows nuclear transport receptor Kap114p with a transcriptional repressor function toward TATA-binding protein," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000054. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.