IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0117658.html
   My bibliography  Save this article

Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival

Author

Listed:
  • Richard E Neapolitan
  • Xia Jiang

Abstract

Background: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. Methodology/Findings: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival. Conclusions/Significance: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.

Suggested Citation

  • Richard E Neapolitan & Xia Jiang, 2015. "Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-16, February.
  • Handle: RePEc:plo:pone00:0117658
    DOI: 10.1371/journal.pone.0117658
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0117658
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0117658&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0117658?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Han-Seong Kim & Jae-Soo Koh & Yong-Bock Choi & Jungsil Ro & Hyun-Kyoung Kim & Mi-Kyung Kim & Byung-Ho Nam & Kyung-Tae Kim & Vishal Chandra & Hye-Sil Seol & Woo-Chul Noh & Eun-Kyu Kim & Joobae Park & C, 2014. "Chromatin CKAP2, a New Proliferation Marker, as Independent Prognostic Indicator in Breast Cancer," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-10, June.
    2. Su, Yu-Sung & Gelman, Andrew & Hill, Jennifer & Yajima, Masanao, 2011. "Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i02).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xia Jiang & Jeremy Jao & Richard Neapolitan, 2015. "Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-23, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    2. Takashi Sugimoto & Tomohiro Shinozaki & Takashi Naruse & Yuki Miyamoto, 2014. "Who Was Concerned about Radiation, Food Safety, and Natural Disasters after the Great East Japan Earthquake and Fukushima Catastrophe? A Nationwide Cross-Sectional Survey in 2012," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-8, September.
    3. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    4. Elizabeth Duthie & Diogo Veríssimo & Aidan Keane & Andrew T Knight, 2017. "The effectiveness of celebrities in conservation marketing," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-16, July.
    5. Thomas R. Belin, 2017. "TRIVELLORE RAGHUNATHAN . Missing Data Analysis in Practice . Boca Raton : CRC Press," Biometrics, The International Biometric Society, vol. 73(3), pages 1059-1060, September.
    6. Christian Seiler, 2013. "Nonresponse in Business Tendency Surveys: Theoretical Discourse and Empirical Evidence," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 52.
    7. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    8. Rashid, S. & Mitra, R. & Steele, R.J., 2015. "Using mixtures of t densities to make inferences in the presence of missing data with a small number of multiply imputed data sets," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 84-96.
    9. repec:jss:jstsof:45:i01 is not listed on IDEAS
    10. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    11. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    12. Florian Meinfelder, 2014. "Multiple Imputation: an attempt to retell the evolutionary process," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 8(4), pages 249-267, November.
    13. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    14. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    15. Oberski, Daniel, 2014. "lavaan.survey: An R Package for Complex Survey Analysis of Structural Equation Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i01).
    16. G. Robin Gauthier & Patricia Wonch Hill & Julia McQuillan & Amy N. Spiegel & Judy Diamond, 2017. "The Potential Scientist’s Dilemma: How the Masculine Framing of Science Shapes Friendships and Science Job Aspirations," Social Sciences, MDPI, vol. 6(1), pages 1-21, February.
    17. Christos T Nakas & Narayan Schütz & Marcus Werners & Alexander B Leichtle, 2016. "Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-11, July.
    18. repec:jss:jstsof:45:i03 is not listed on IDEAS
    19. Labrecque, Jeremy A. & Kaufman, Jay S. & Balzer, Laura B. & Maclehose, Richard F. & Strumpf, Erin C. & Matijasevich, Alicia & Santos, Iná S. & Schmidt, Kelen H. & Barros, Aluísio J.D., 2018. "Effect of a conditional cash transfer program on length-for-age and weight-for-age in Brazilian infants at 24 months using doubly-robust, targeted estimation," Social Science & Medicine, Elsevier, vol. 211(C), pages 9-15.
    20. Tendeiro, Jorge N. & Meijer, Rob R. & Niessen, A. Susan M., 2016. "PerFit: An R Package for Person-Fit Analysis in IRT," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i05).
    21. Speidel, Matthias & Drechsler, Jörg & Jolani, Shahab, 2018. "R package hmi: a convenient tool for hierarchical multiple imputation and beyond," IAB-Discussion Paper 201816, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    22. Gary K Chen & Eric C Chi & John Michael O Ranola & Kenneth Lange, 2015. "Convex Clustering: An Attractive Alternative to Hierarchical Clustering," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-31, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0117658. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.