IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003047.html
   My bibliography  Save this article

Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

Author

Listed:
  • Erhan Bilal
  • Janusz Dutkowski
  • Justin Guinney
  • In Sock Jang
  • Benjamin A Logsdon
  • Gaurav Pandey
  • Benjamin A Sauerwine
  • Yishai Shimoni
  • Hans Kristian Moen Vollan
  • Brigham H Mecham
  • Oscar M Rueda
  • Jorg Tost
  • Christina Curtis
  • Mariano J Alvarez
  • Vessela N Kristensen
  • Samuel Aparicio
  • Anne-Lise Børresen-Dale
  • Carlos Caldas
  • Andrea Califano
  • Stephen H Friend
  • Trey Ideker
  • Eric E Schadt
  • Gustavo A Stolovitzky
  • Adam A Margolin

Abstract

Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.Author Summary: We developed an extensible software framework for sharing molecular prognostic models of breast cancer survival in a transparent collaborative environment and subjecting each model to automated evaluation using objective metrics. The computational framework presented in this study, our detailed post-hoc analysis of hundreds of modeling approaches, and the use of a novel cutting-edge data resource together represents one of the largest-scale systematic studies to date assessing the factors influencing accuracy of molecular-based prognostic models in breast cancer. Our results demonstrate the ability to infer prognostic models with accuracy on par or greater than previously reported studies, with significant performance improvements by using state-of-the-art machine learning approaches trained on clinical covariates. Our results also demonstrate the difficultly in incorporating molecular data to achieve substantial performance improvements over clinical covariates alone. However, improvement was achieved by combining clinical feature data with intelligent selection of important molecular features based on domain-specific prior knowledge. We observe that ensemble models aggregating the information across many diverse models achieve among the highest scores of all models and systematically out-perform individual models within the ensemble, suggesting a general strategy for leveraging the wisdom of crowds to develop robust predictive models.

Suggested Citation

  • Erhan Bilal & Janusz Dutkowski & Justin Guinney & In Sock Jang & Benjamin A Logsdon & Gaurav Pandey & Benjamin A Sauerwine & Yishai Shimoni & Hans Kristian Moen Vollan & Brigham H Mecham & Oscar M Rue, 2013. "Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-16, May.
  • Handle: RePEc:plo:pcbi00:1003047
    DOI: 10.1371/journal.pcbi.1003047
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003047
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003047&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003047?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Athanasopoulos, George & Hyndman, Rob J., 2011. "The value of feedback in forecasting competitions," International Journal of Forecasting, Elsevier, vol. 27(3), pages 845-849.
    2. Charles M. Perou & Therese Sørlie & Michael B. Eisen & Matt van de Rijn & Stefanie S. Jeffrey & Christian A. Rees & Jonathan R. Pollack & Douglas T. Ross & Hilde Johnsen & Lars A. Akslen & Øystein Flu, 2000. "Molecular portraits of human breast tumours," Nature, Nature, vol. 406(6797), pages 747-752, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sanju Sinha & Karina Barbosa & Kuoyuan Cheng & Mark D. M. Leiserson & Prashant Jain & Anagha Deshpande & David M. Wilson & Bríd M. Ryan & Ji Luo & Ze’ev A. Ronai & Joo Sang Lee & Aniruddha J. Deshpand, 2021. "A systematic genome-wide mapping of oncogenic mutation selection during CRISPR-Cas9 genome editing," Nature Communications, Nature, vol. 12(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yang, Xi & Hoadley, Katherine A. & Hannig, Jan & Marron, J.S., 2023. "Jackstraw inference for AJIVE data integration," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Egashira, Kento & Yata, Kazuyoshi & Aoshima, Makoto, 2024. "Asymptotic properties of hierarchical clustering in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    3. María Elena Martínez & Jonathan T Unkart & Li Tao & Candyce H Kroenke & Richard Schwab & Ian Komenaka & Scarlett Lin Gomez, 2017. "Prognostic significance of marital status in breast cancer survival: A population-based study," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-14, May.
    4. Yishai Shimoni, 2018. "Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification," PLOS Computational Biology, Public Library of Science, vol. 14(2), pages 1-15, February.
    5. Yoo-Ah Kim & Stefan Wuchty & Teresa M Przytycka, 2011. "Identifying Causal Genes and Dysregulated Pathways in Complex Diseases," PLOS Computational Biology, Public Library of Science, vol. 7(3), pages 1-13, March.
    6. Radhakrishnan Nagarajan & Marco Scutari, 2013. "Impact of Noise on Molecular Network Inference," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-12, December.
    7. R Joseph Bender & Feilim Mac Gabhann, 2013. "Expression of VEGF and Semaphorin Genes Define Subgroups of Triple Negative Breast Cancer," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.
    8. Deepak Poduval & Zuzana Sichmanova & Anne Hege Straume & Per Eystein Lønning & Stian Knappskog, 2020. "The novel microRNAs hsa-miR-nov7 and hsa-miR-nov3 are over-expressed in locally advanced breast cancer," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-23, April.
    9. Zhiguang Huo & Li Zhu & Tianzhou Ma & Hongcheng Liu & Song Han & Daiqing Liao & Jinying Zhao & George Tseng, 2020. "Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(1), pages 1-22, April.
    10. Markus Ringnér & Erik Fredlund & Jari Häkkinen & Åke Borg & Johan Staaf, 2011. "GOBO: Gene Expression-Based Outcome for Breast Cancer Online," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-11, March.
    11. Casey S Greene & Olga G Troyanskaya, 2012. "Chapter 2: Data-Driven View of Disease Biology," PLOS Computational Biology, Public Library of Science, vol. 8(12), pages 1-8, December.
    12. Mark Reimers, 2010. "Making Informed Choices about Microarray Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-7, May.
    13. Alan A. Arslan & Yian Zhang & Nedim Durmus & Sultan Pehlivan & Adrienne Addessi & Freya Schnabel & Yongzhao Shao & Joan Reibman, 2021. "Breast Cancer Characteristics in the Population of Survivors Participating in the World Trade Center Environmental Health Center Program 2002–2019," IJERPH, MDPI, vol. 18(14), pages 1-11, July.
    14. Sandra M. Rocha & Sílvia Socorro & Luís A. Passarinha & Cláudio J. Maia, 2022. "Comprehensive Landscape of STEAP Family Members Expression in Human Cancers: Unraveling the Potential Usefulness in Clinical Practice Using Integrated Bioinformatics Analysis," Data, MDPI, vol. 7(5), pages 1-48, May.
    15. Martin H van Vliet & Christiaan N Klijn & Lodewyk F A Wessels & Marcel J T Reinders, 2007. "Module-Based Outcome Prediction Using Breast Cancer Compendia," PLOS ONE, Public Library of Science, vol. 2(10), pages 1-10, October.
    16. Sung Gwe Ahn & Minkyung Lee & Tae Joo Jeon & Kyunghwa Han & Hak Min Lee & Seung Ah Lee & Young Hoon Ryu & Eun Ju Son & Joon Jeong, 2014. "[18F]-Fluorodeoxyglucose Positron Emission Tomography Can Contribute to Discriminate Patients with Poor Prognosis in Hormone Receptor-Positive Breast Cancer," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-7, August.
    17. Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilios, 2022. "The M5 competition: Background, organization, and implementation," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1325-1336.
    18. Maurizio Callari & Antonio Lembo & Giampaolo Bianchini & Valeria Musella & Vera Cappelletti & Luca Gianni & Maria Grazia Daidone & Paolo Provero, 2014. "Accurate Data Processing Improves the Reliability of Affymetrix Gene Expression Profiles from FFPE Samples," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-10, January.
    19. Silje Kjølle & Kenneth Finne & Even Birkeland & Vandana Ardawatia & Ingeborg Winge & Sura Aziz & Gøril Knutsvik & Elisabeth Wik & Joao A. Paulo & Heidrun Vethe & Dimitrios Kleftogiannis & Lars A. Aksl, 2023. "Hypoxia induced responses are reflected in the stromal proteome of breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    20. Hyndman, Rob J., 2020. "A brief history of forecasting competitions," International Journal of Forecasting, Elsevier, vol. 36(1), pages 7-14.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003047. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.