IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0097079.html
   My bibliography  Save this article

Ensemble Positive Unlabeled Learning for Disease Gene Identification

Author

Listed:
  • Peng Yang
  • Xiaoli Li
  • Hon-Nian Chua
  • Chee-Keong Kwoh
  • See-Kiong Ng

Abstract

An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.

Suggested Citation

  • Peng Yang & Xiaoli Li & Hon-Nian Chua & Chee-Keong Kwoh & See-Kiong Ng, 2014. "Ensemble Positive Unlabeled Learning for Disease Gene Identification," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-11, May.
  • Handle: RePEc:plo:pone00:0097079
    DOI: 10.1371/journal.pone.0097079
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0097079
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0097079&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0097079?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peng Yang & Xiaoli Li & Min Wu & Chee-Keong Kwoh & See-Kiong Ng, 2011. "Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-11, July.
    2. Oron Vanunu & Oded Magger & Eytan Ruppin & Tomer Shlomi & Roded Sharan, 2010. "Associating Genes and Protein Complexes with Disease via Network Propagation," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-9, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. MaoQiang Xie & YingJie Xu & YaoGong Zhang & TaeHyun Hwang & Rui Kuang, 2015. "Network-based Phenome-Genome Association Prediction by Bi-Random Walk," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-18, May.
    2. Ke Hu & Ju Xiang & Yun-Xia Yu & Liang Tang & Qin Xiang & Jian-Ming Li & Yong-Hong Tang & Yong-Jun Chen & Yan Zhang, 2020. "Significance-based multi-scale method for network community detection and its application in disease-gene prediction," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-24, March.
    3. T M Murali & Matthew D Dyer & David Badger & Brett M Tyler & Michael G Katze, 2011. "Network-Based Prediction and Analysis of HIV Dependency Factors," PLOS Computational Biology, Public Library of Science, vol. 7(9), pages 1-15, September.
    4. Deborah Chasman & Brandi Gancarz & Linhui Hao & Michael Ferris & Paul Ahlquist & Mark Craven, 2014. "Inferring Host Gene Subnetworks Involved in Viral Replication," PLOS Computational Biology, Public Library of Science, vol. 10(5), pages 1-22, May.
    5. Xing Chen & Jun Yin & Jia Qu & Li Huang, 2018. "MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction," PLOS Computational Biology, Public Library of Science, vol. 14(8), pages 1-24, August.
    6. Li-Chen Hung & Pei-Tseng Kung & Chi-Hsuan Lung & Ming-Hsui Tsai & Shih-An Liu & Li-Ting Chiu & Kuang-Hua Huang & Wen-Chen Tsai, 2020. "Assessment of the Risk of Oral Cancer Incidence in A High-Risk Population and Establishment of A Predictive Model for Oral Cancer Incidence Using A Population-Based Cohort in Taiwan," IJERPH, MDPI, vol. 17(2), pages 1-15, January.
    7. Jianhua Li & Xiaoyan Lin & Yueyang Teng & Shouliang Qi & Dayu Xiao & Jianying Zhang & Yan Kang, 2016. "A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-18, July.
    8. Le Ou-Yang & Dao-Qing Dai & Xiao-Fei Zhang, 2013. "Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-18, May.
    9. Elisa Salviato & Vera Djordjilović & Monica Chiogna & Chiara Romualdi, 2019. "SourceSet: A graphical model approach to identify primary genes in perturbed biological pathways," PLOS Computational Biology, Public Library of Science, vol. 15(10), pages 1-28, October.
    10. Oded Magger & Yedael Y Waldman & Eytan Ruppin & Roded Sharan, 2012. "Enhancing the Prioritization of Disease-Causing Genes through Tissue Specific Protein Interaction Networks," PLOS Computational Biology, Public Library of Science, vol. 8(9), pages 1-10, September.
    11. Cui, Ying & Cai, Meng & Stanley, H. Eugene, 2018. "Discovering disease-associated genes in weighted protein–protein interaction networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 496(C), pages 53-61.
    12. Mengyun Yang & Huimin Luo & Yaohang Li & Fang-Xiang Wu & Jianxin Wang, 2019. "Overlap matrix completion for predicting drug-associated indications," PLOS Computational Biology, Public Library of Science, vol. 15(12), pages 1-21, December.
    13. Abby Hill & Scott Gleim & Florian Kiefer & Frederic Sigoillot & Joseph Loureiro & Jeremy Jenkins & Melody K Morris, 2019. "Benchmarking network algorithms for contextualizing genes of interest," PLOS Computational Biology, Public Library of Science, vol. 15(12), pages 1-14, December.
    14. Florin Ratajczak & Mitchell Joblin & Marcel Hildebrandt & Martin Ringsquandl & Pascal Falter-Braun & Matthias Heinig, 2023. "Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    15. Daniel E Carlin & Barry Demchak & Dexter Pratt & Eric Sage & Trey Ideker, 2017. "Network propagation in the cytoscape cyberinfrastructure," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-9, October.
    16. Dorothea Emig & Alexander Ivliev & Olga Pustovalova & Lee Lancashire & Svetlana Bureeva & Yuri Nikolsky & Marina Bessarabova, 2013. "Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-17, April.
    17. Juan J Cáceres & Alberto Paccanaro, 2019. "Disease gene prediction for molecularly uncharacterized diseases," PLOS Computational Biology, Public Library of Science, vol. 15(7), pages 1-14, July.
    18. U Martin Singh-Blom & Nagarajan Natarajan & Ambuj Tewari & John O Woods & Inderjit S Dhillon & Edward M Marcotte, 2013. "Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-17, May.
    19. Yadong Dong & Yongqi Sun & Chao Qin, 2018. "Predicting protein complexes using a supervised learning method combined with local structural information," PLOS ONE, Public Library of Science, vol. 13(3), pages 1-23, March.
    20. Joana P Gonçalves & Alexandre P Francisco & Yves Moreau & Sara C Madeira, 2012. "Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-13, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0097079. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.