Author
Listed:
- David G P van IJzendoorn
- Karoly Szuhai
- Inge H Briaire-de Bruijn
- Marie Kostine
- Marieke L Kuijjer
- Judith V M G Bovée
Abstract
Based on morphology it is often challenging to distinguish between the many different soft tissue sarcoma subtypes. Moreover, outcome of disease is highly variable even between patients with the same disease. Machine learning on transcriptome sequencing data could be a valuable new tool to understand differences between and within entities. Here we used machine learning analysis to identify novel diagnostic and prognostic markers and therapeutic targets for soft tissue sarcomas. Gene expression data was used from the Cancer Genome Atlas, the Genotype-Tissue Expression project and the French Sarcoma Group. We identified three groups of tumors that overlap in their molecular profiles as seen with unsupervised t-Distributed Stochastic Neighbor Embedding clustering and a deep neural network. The three groups corresponded to subtypes that are morphologically overlapping. Using a random forest algorithm, we identified novel diagnostic markers for soft tissue sarcoma that distinguished between synovial sarcoma and MPNST, and that we validated using qRT-PCR in an independent series. Next, we identified prognostic genes that are strong predictors of disease outcome when used in a k-nearest neighbor algorithm. The prognostic genes were further validated in expression data from the French Sarcoma Group. One of these, HMMR, was validated in an independent series of leiomyosarcomas using immunohistochemistry on tissue micro array as a prognostic gene for disease-free interval. Furthermore, reconstruction of regulatory networks combined with data from the Connectivity Map showed, amongst others, that HDAC inhibitors could be a potential effective therapy for multiple soft tissue sarcoma subtypes. A viability assay with two HDAC inhibitors confirmed that both leiomyosarcoma and synovial sarcoma are sensitive to HDAC inhibition. In this study we identified novel diagnostic markers, prognostic markers and therapeutic leads from multiple soft tissue sarcoma gene expression datasets. Thus, machine learning algorithms are powerful new tools to improve our understanding of rare tumor entities.Author summary: Soft-tissue sarcomas are a group of rare cancers that can be challenging to diagnose and treat. The morphology of the different soft-tissue sarcoma subtypes can overlap and the prognosis differs significantly between, and also within, the different subtypes. Moreover, targeted therapies are often not available. In this study we used transcriptome sequencing data from The Cancer Genome Atlas, containing 206 soft-tissue sarcoma samples which we analyzed using different machine learning algorithms to gain novel insights. When possible, we verified our findings in independent datasets or in cell lines. First, we found that both synovial sarcomas and malignant peripheral nerve sheath tumors show the largest overlap with normal tissue derived from the nervous system. This link with neural differentiation for synovial sarcoma was not well established until now. Second, genes were identified whose expression could be used to differentiate between the different soft-tissue sarcomas where the morphology overlaps. Third, novel prognostic genes were identified for separate subtypes. One gene, HMMR, which we found as a strong prognostic gene for leiomyosarcoma, was verified with immunohistochemistry on samples from our archives. Last, using a network analysis new potential therapies were identified. HDAC inhibitors were identified as a potential strong therapy for sarcomas, including leiomyosarcomas, which we verified in cell lines.
Suggested Citation
David G P van IJzendoorn & Karoly Szuhai & Inge H Briaire-de Bruijn & Marie Kostine & Marieke L Kuijjer & Judith V M G Bovée, 2019.
"Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas,"
PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-19, February.
Handle:
RePEc:plo:pcbi00:1006826
DOI: 10.1371/journal.pcbi.1006826
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006826. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.