IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002511.html
   My bibliography  Save this article

Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes

Author

Listed:
  • Christof Winter
  • Glen Kristiansen
  • Stephan Kersting
  • Janine Roy
  • Daniela Aust
  • Thomas Knösel
  • Petra Rümmele
  • Beatrix Jahnke
  • Vera Hentrich
  • Felix Rückert
  • Marco Niedergethmann
  • Wilko Weichert
  • Marcus Bahra
  • Hans J Schlitt
  • Utz Settmacher
  • Helmut Friess
  • Markus Büchler
  • Hans-Detlev Saeger
  • Michael Schroeder
  • Christian Pilarsky
  • Robert Grützmann

Abstract

Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. Author Summary: Why do some people with the same type of cancer die early and some live long? Apart from influences from the environment and personal lifestyle, we believe that differences in the individual tumor genome account for different survival times. Recently, powerful methods have become available to systematically read genomic information of patient samples. The major remaining challenge is how to spot, among the thousands of changes, those few that are relevant for tumor aggressiveness and thereby affecting patient survival. Here, we make use of the fact that genes and proteins in a cell never act alone, but form a network of interactions. Finding the relevant information in big networks of web documents and hyperlinks has been mastered by Google with their PageRank algorithm. Similar to PageRank, we have developed an algorithm that can identify genes that are better indicators for survival than genes found by traditional algorithms. Our method can aid the clinician in deciding if a patient should receive chemotherapy or not. Reliable prediction of survival and response to therapy based on molecular markers bears a great potential to improve and personalize patient therapies in the future.

Suggested Citation

  • Christof Winter & Glen Kristiansen & Stephan Kersting & Janine Roy & Daniela Aust & Thomas Knösel & Petra Rümmele & Beatrix Jahnke & Vera Hentrich & Felix Rückert & Marco Niedergethmann & Wilko Weiche, 2012. "Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes," PLOS Computational Biology, Public Library of Science, vol. 8(5), pages 1-16, May.
  • Handle: RePEc:plo:pcbi00:1002511
    DOI: 10.1371/journal.pcbi.1002511
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002511
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002511&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002511?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Laura J. van 't Veer & Hongyue Dai & Marc J. van de Vijver & Yudong D. He & Augustinus A. M. Hart & Mao Mao & Hans L. Peterse & Karin van der Kooy & Matthew J. Marton & Anke T. Witteveen & George J. S, 2002. "Gene expression profiling predicts clinical outcome of breast cancer," Nature, Nature, vol. 415(6871), pages 530-536, January.
    2. Tibshirani Robert J. & Efron Brad, 2002. "Pre-validation and inference in microarrays," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 1(1), pages 1-20, August.
    3. Scott L. Pomeroy & Pablo Tamayo & Michelle Gaasenbeek & Lisa M. Sturla & Michael Angelo & Margaret E. McLaughlin & John Y. H. Kim & Liliana C. Goumnerova & Peter M. Black & Ching Lau & Jeffrey C. Alle, 2002. "Prediction of central nervous system embryonal tumour outcome based on gene expression," Nature, Nature, vol. 415(6870), pages 436-442, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yupeng Cun & Holger Fröhlich, 2013. "Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-9, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lama, Nicola & Boracchi, Patrizia & Biganzoli, Elia, 2009. "Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1906-1922, March.
    2. Tibshirani Robert J., 2009. "Univariate Shrinkage in the Cox Model for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, April.
    3. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    4. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    5. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    6. Sijia Huang & Cameron Yee & Travers Ching & Herbert Yu & Lana X Garmire, 2014. "A Novel Model to Combine Clinical and Pathway-Based Transcriptomic Information for the Prognosis Prediction of Breast Cancer," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-15, September.
    7. Lian, Heng & Du, Pang & Li, YuanZhang & Liang, Hua, 2014. "Partially linear structure identification in generalized additive models with NP-dimensionality," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 197-208.
    8. Jan, Budczies & Kosztyla, Daniel & von Törne, Christian & Stenzinger, Albrecht & Darb-Esfahani, Silvia & Dietel, Manfred & Denkert, Carsten, 2014. "cancerclass: An R Package for Development and Validation of Diagnostic Tests from High-Dimensional Molecular Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i01).
    9. Zhaoliang Wang & Liugen Xue & Gaorong Li & Fei Lu, 2019. "Spline estimator for ultra-high dimensional partially linear varying coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 657-677, June.
    10. Jun Yao & Qi Zhao & Ying Yuan & Li Zhang & Xiaoming Liu & W K Alfred Yung & John N Weinstein, 2012. "Identification of Common Prognostic Gene Expression Signatures with Biological Meanings from Microarray Gene Expression Datasets," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-11, September.
    11. Lian, I.B. & Chang, C.J. & Liang, Y.J. & Yang, M.J. & Fann, C.S.J., 2007. "Identifying differentially expressed genes in dye-swapped microarray experiments of small sample size," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2602-2620, February.
    12. Garrett Green & Ruben Carmona & Kaveh Zakeri & Chih-Han Lee & Saif Borgan & Zaid Marhoon & Andrew Sharabi & Loren K Mell, 2016. "Specificity of Genetic Biomarker Studies in Cancer Research: A Systematic Review," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-7, July.
    13. Grace Y. Yi & Wenqing He & Raymond. J. Carroll, 2022. "Feature screening with large‐scale and high‐dimensional survival data," Biometrics, The International Biometric Society, vol. 78(3), pages 894-907, September.
    14. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    15. Khan Md Hasinur Rahaman & Bhadra Anamika & Howlader Tamanna, 2019. "Stability selection for lasso, ridge and elastic net implemented with AFT models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(5), pages 1-14, October.
    16. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    17. Lida Qiu & Deyong Kang & Chuan Wang & Wenhui Guo & Fangmeng Fu & Qingxiang Wu & Gangqin Xi & Jiajia He & Liqin Zheng & Qingyuan Zhang & Xiaoxia Liao & Lianhuang Li & Jianxin Chen & Haohua Tu, 2022. "Intratumor graph neural network recovers hidden prognostic value of multi-biomarker spatial heterogeneity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    18. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    19. Guan-Hua Huang & Su-Mei Wang & Chung-Chu Hsu, 2011. "Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses," Psychometrika, Springer;The Psychometric Society, vol. 76(4), pages 584-611, October.
    20. Foucher Yohann & Danger Richard, 2012. "Time Dependent ROC Curves for the Estimation of True Prognostic Capacity of Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(6), pages 1-22, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002511. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.