IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003047.html
   My bibliography  Save this article

Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

Author

Listed:
  • Erhan Bilal
  • Janusz Dutkowski
  • Justin Guinney
  • In Sock Jang
  • Benjamin A Logsdon
  • Gaurav Pandey
  • Benjamin A Sauerwine
  • Yishai Shimoni
  • Hans Kristian Moen Vollan
  • Brigham H Mecham
  • Oscar M Rueda
  • Jorg Tost
  • Christina Curtis
  • Mariano J Alvarez
  • Vessela N Kristensen
  • Samuel Aparicio
  • Anne-Lise Børresen-Dale
  • Carlos Caldas
  • Andrea Califano
  • Stephen H Friend
  • Trey Ideker
  • Eric E Schadt
  • Gustavo A Stolovitzky
  • Adam A Margolin

Abstract

Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.Author Summary: We developed an extensible software framework for sharing molecular prognostic models of breast cancer survival in a transparent collaborative environment and subjecting each model to automated evaluation using objective metrics. The computational framework presented in this study, our detailed post-hoc analysis of hundreds of modeling approaches, and the use of a novel cutting-edge data resource together represents one of the largest-scale systematic studies to date assessing the factors influencing accuracy of molecular-based prognostic models in breast cancer. Our results demonstrate the ability to infer prognostic models with accuracy on par or greater than previously reported studies, with significant performance improvements by using state-of-the-art machine learning approaches trained on clinical covariates. Our results also demonstrate the difficultly in incorporating molecular data to achieve substantial performance improvements over clinical covariates alone. However, improvement was achieved by combining clinical feature data with intelligent selection of important molecular features based on domain-specific prior knowledge. We observe that ensemble models aggregating the information across many diverse models achieve among the highest scores of all models and systematically out-perform individual models within the ensemble, suggesting a general strategy for leveraging the wisdom of crowds to develop robust predictive models.

Suggested Citation

  • Erhan Bilal & Janusz Dutkowski & Justin Guinney & In Sock Jang & Benjamin A Logsdon & Gaurav Pandey & Benjamin A Sauerwine & Yishai Shimoni & Hans Kristian Moen Vollan & Brigham H Mecham & Oscar M Rue, 2013. "Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-16, May.
  • Handle: RePEc:plo:pcbi00:1003047
    DOI: 10.1371/journal.pcbi.1003047
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003047
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003047&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003047?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Athanasopoulos, George & Hyndman, Rob J., 2011. "The value of feedback in forecasting competitions," International Journal of Forecasting, Elsevier, vol. 27(3), pages 845-849.
    2. Charles M. Perou & Therese Sørlie & Michael B. Eisen & Matt van de Rijn & Stefanie S. Jeffrey & Christian A. Rees & Jonathan R. Pollack & Douglas T. Ross & Hilde Johnsen & Lars A. Akslen & Øystein Flu, 2000. "Molecular portraits of human breast tumours," Nature, Nature, vol. 406(6797), pages 747-752, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sanju Sinha & Karina Barbosa & Kuoyuan Cheng & Mark D. M. Leiserson & Prashant Jain & Anagha Deshpande & David M. Wilson & Bríd M. Ryan & Ji Luo & Ze’ev A. Ronai & Joo Sang Lee & Aniruddha J. Deshpand, 2021. "A systematic genome-wide mapping of oncogenic mutation selection during CRISPR-Cas9 genome editing," Nature Communications, Nature, vol. 12(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yang, Xi & Hoadley, Katherine A. & Hannig, Jan & Marron, J.S., 2023. "Jackstraw inference for AJIVE data integration," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Manish G & Anil Kumar Badana & Rama Rao Malla, 2017. "Emerging Diagnostic and Prognostic Biomarkers of Triple Negative Breast Cancer," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 1(3), pages 561-565, August.
    3. Jacob Elnaggar & Fern Tsien & Lucio Miele & Chindo Hicks & Clayton Yates & Melisa Davis, 2019. "An Integrative Genomics Approach for Associating Genetic Susceptibility with the Tumor Immune Microenvironment in Triple Negative Breast Cancer," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 15(1), pages 1-12, February.
    4. Egashira, Kento & Yata, Kazuyoshi & Aoshima, Makoto, 2024. "Asymptotic properties of hierarchical clustering in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    5. María Elena Martínez & Jonathan T Unkart & Li Tao & Candyce H Kroenke & Richard Schwab & Ian Komenaka & Scarlett Lin Gomez, 2017. "Prognostic significance of marital status in breast cancer survival: A population-based study," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-14, May.
    6. Yishai Shimoni, 2018. "Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification," PLOS Computational Biology, Public Library of Science, vol. 14(2), pages 1-15, February.
    7. Marcin Pilarczyk & Mehdi Fazel-Najafabadi & Michal Kouril & Behrouz Shamsaei & Juozas Vasiliauskas & Wen Niu & Naim Mahi & Lixia Zhang & Nicholas A. Clark & Yan Ren & Shana White & Rashid Karim & Huan, 2022. "Connecting omics signatures and revealing biological mechanisms with iLINCS," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    8. Bojer, Casper Solheim & Meldgaard, Jens Peder, 2021. "Kaggle forecasting competitions: An overlooked learning opportunity," International Journal of Forecasting, Elsevier, vol. 37(2), pages 587-603.
    9. Junhee Seok & Ronald W Davis & Wenzhong Xiao, 2015. "A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-15, May.
    10. Qing Qu & Yan Mao & Xiao-chun Fei & Kun-wei Shen, 2013. "The Impact of Androgen Receptor Expression on Breast Cancer Survival: A Retrospective Study and Meta-Analysis," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-1, December.
    11. Bourret, Pascale & Keating, Peter & Cambrosio, Alberto, 2011. "Regulating diagnosis in post-genomic medicine: Re-aligning clinical judgment?," Social Science & Medicine, Elsevier, vol. 73(6), pages 816-824, September.
    12. G. Gambardella & G. Viscido & B. Tumaini & A. Isacchi & R. Bosotti & D. di Bernardo, 2022. "A single-cell analysis of breast cancer cell lines to study tumour heterogeneity and drug response," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    13. Yoo-Ah Kim & Stefan Wuchty & Teresa M Przytycka, 2011. "Identifying Causal Genes and Dysregulated Pathways in Complex Diseases," PLOS Computational Biology, Public Library of Science, vol. 7(3), pages 1-13, March.
    14. Pauliina M. Munne & Lahja Martikainen & Iiris Räty & Kia Bertula & Nonappa & Janika Ruuska & Hanna Ala-Hongisto & Aino Peura & Babette Hollmann & Lilya Euro & Kerim Yavuz & Linda Patrikainen & Maria S, 2021. "Compressive stress-mediated p38 activation required for ERα + phenotype in breast cancer," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    15. Radhakrishnan Nagarajan & Marco Scutari, 2013. "Impact of Noise on Molecular Network Inference," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-12, December.
    16. R Joseph Bender & Feilim Mac Gabhann, 2013. "Expression of VEGF and Semaphorin Genes Define Subgroups of Triple Negative Breast Cancer," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.
    17. Emrouznejad, Ali & Rostami-Tabar, Bahman & Petridis, Konstantinos, 2016. "A novel ranking procedure for forecasting approaches using Data Envelopment Analysis," Technological Forecasting and Social Change, Elsevier, vol. 111(C), pages 235-243.
    18. Marron, J.S., 2017. "Big Data in context and robustness against heterogeneity," Econometrics and Statistics, Elsevier, vol. 2(C), pages 73-80.
    19. Deepak Poduval & Zuzana Sichmanova & Anne Hege Straume & Per Eystein Lønning & Stian Knappskog, 2020. "The novel microRNAs hsa-miR-nov7 and hsa-miR-nov3 are over-expressed in locally advanced breast cancer," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-23, April.
    20. Mariana Segovia-Mendoza & Margarita Isabel Palacios-Arreola & Luz María Monroy-Escamilla & Alexandra Estela Soto-Piña & Karen Elizabeth Nava-Castro & Yizel Becerril-Alarcón & Roberto Camacho-Beiza & D, 2022. "Association of Serum Levels of Plasticizers Compounds, Phthalates and Bisphenols, in Patients and Survivors of Breast Cancer: A Real Connection?," IJERPH, MDPI, vol. 19(13), pages 1-22, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003047. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.