IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002110.html
   My bibliography  Save this article

Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

Author

Listed:
  • Chuanhua Xing
  • David B Dunson

Abstract

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility. Author Summary: Protein interactions are the basic units in almost all biological processes. It is thus vitally important to reconstruct protein-protein interactions (PPIs) before we can fully understand biological processes. However, critical difficulties exist. Particularly the rate of wrongly predicting PPIs to be true (false positive rate) is extremely high in PPIs prediction. The traditional approaches of error correction from each generating source can be both time-consuming and inefficient. We propose a method that can substantially reduce false positive rates by emphasizing information from more reliable data sources, and de-emphasizing less reliable sources. We indicate that it is indeed the case from our extensive studies. Our predictions also suggest that large numbers of not only false positives but also false negatives may exist in previous studies, as validated by two human PPIs datasets having high quality. The ability to predict large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and speed up PPIs prediction with high quality. Reliable prediction from our method may benefit other studies involving such as protein function prediction and roles of PPIs in disease susceptibility.

Suggested Citation

  • Chuanhua Xing & David B Dunson, 2011. "Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions," PLOS Computational Biology, Public Library of Science, vol. 7(7), pages 1-10, July.
  • Handle: RePEc:plo:pcbi00:1002110
    DOI: 10.1371/journal.pcbi.1002110
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002110
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002110&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002110?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Benjamin A Shoemaker & Anna R Panchenko, 2007. "Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners," PLOS Computational Biology, Public Library of Science, vol. 3(4), pages 1-7, April.
    2. David B. Dunson & Shyamal D. Peddada, 2008. "Bayesian nonparametric inference on stochastic ordering," Biometrika, Biometrika Trust, vol. 95(4), pages 859-874.
    3. Antigoni Elefsinioti & Marit Ackermann & Andreas Beyer, 2009. "Accounting for Redundancy when Integrating Gene Interaction Databases," PLOS ONE, Public Library of Science, vol. 4(10), pages 1-9, October.
    4. Anton J. Enright & Ioannis Iliopoulos & Nikos C. Kyrpides & Christos A. Ouzounis, 1999. "Protein interaction maps for complete genomes based on gene fusion events," Nature, Nature, vol. 402(6757), pages 86-90, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Saeid Rasti & Chrysafis Vogiatzis, 2019. "A survey of computational methods in protein–protein interaction networks," Annals of Operations Research, Springer, vol. 276(1), pages 35-87, May.
    2. Vijaykumar Yogesh Muley & Akash Ranjan, 2012. "Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-13, July.
    3. Beom Seuk Hwang & Zhen Chen, 2015. "An Integrated Bayesian Nonparametric Approach for Stochastic and Variability Orders in ROC Curve Estimation: An Application to Endometriosis Diagnosis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 923-934, September.
    4. Kelter, Riko, 2022. "Power analysis and type I and type II error rates of Bayesian nonparametric two-sample tests for location-shifts based on the Bayes factor under Cauchy priors," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    5. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    6. Colizza, Vittoria & Flammini, Alessandro & Maritan, Amos & Vespignani, Alessandro, 2005. "Characterization and modeling of protein–protein interaction networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 352(1), pages 1-27.
    7. Sayed Mohammad Ebrahim Sahraeian & Byung-Jun Yoon, 2012. "A Network Synthesis Model for Generating Protein Interaction Network Families," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-14, August.
    8. Saket Navlakha & Anthony Gitter & Ziv Bar-Joseph, 2012. "A Network-based Approach for Predicting Missing Pathway Interactions," PLOS Computational Biology, Public Library of Science, vol. 8(8), pages 1-13, August.
    9. Beatriz García-Jiménez & David Juan & Iakes Ezkurdia & Eduardo Andrés-León & Alfonso Valencia, 2010. "Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach," PLOS ONE, Public Library of Science, vol. 5(4), pages 1-10, April.
    10. Guilherme T Valente & Marcio L Acencio & Cesar Martins & Ney Lemke, 2013. "The Development of a Universal In Silico Predictor of Protein-Protein Interactions," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-11, May.
    11. Wei Zhang & Jia Xu & Yuanyuan Li & Xiufen Zou, 2017. "A new two-stage method for revealing missing parts of edges in protein-protein interaction networks," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    12. Yeonseung Chung & David Dunson, 2011. "The local Dirichlet process," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(1), pages 59-80, February.
    13. Jana Kludas & Mikko Arvas & Sandra Castillo & Tiina Pakula & Merja Oja & Céline Brouard & Jussi Jäntti & Merja Penttilä & Juho Rousu, 2016. "Machine Learning of Protein Interactions in Fungal Secretory Pathways," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-20, July.
    14. Chittibabu Guda & Brian R King & Lipika R Pal & Purnima Guda, 2009. "A Top-Down Approach to Infer and Compare Domain-Domain Interactions across Eight Model Organisms," PLOS ONE, Public Library of Science, vol. 4(3), pages 1-15, March.
    15. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2011. "Beta-product Poisson-Dirichlet Processes," DES - Working Papers. Statistics and Econometrics. WS 12160, Universidad Carlos III de Madrid. Departamento de Estadística.
    16. Hai-Bo Zhang & Xiao-Bao Ding & Jie Jin & Wen-Ping Guo & Qiao-Lei Yang & Peng-Cheng Chen & Heng Yao & Li Ruan & Yu-Tian Tao & Xin Chen, 2022. "Predicted mouse interactome and network-based interpretation of differentially expressed genes," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-16, April.
    17. Tomasz Rychlik, 2019. "Sharp bounds on distribution functions and expectations of mixtures of ordered families of distributions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 166-195, March.
    18. Zhu-Hong You & Keith C C Chan & Pengwei Hu, 2015. "Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-19, May.
    19. Benjamin A Shoemaker & Anna R Panchenko, 2007. "Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners," PLOS Computational Biology, Public Library of Science, vol. 3(4), pages 1-7, April.
    20. Xue Wang & Yuejin Wu & Rujing Wang & Yuanyuan Wei & Yuanmiao Gui, 2019. "A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-12, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002110. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.