IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008183.html
   My bibliography  Save this article

Inferring tumor progression in large datasets

Author

Listed:
  • Mohammadreza Mohaghegh Neyshabouri
  • Seong-Hwan Jun
  • Jens Lagergren

Abstract

Identification of mutations of the genes that give cancer a selective advantage is an important step towards research and clinical objectives. As such, there has been a growing interest in developing methods for identification of driver genes and their temporal order within a single patient (intra-tumor) as well as across a cohort of patients (inter-tumor). In this paper, we develop a probabilistic model for tumor progression, in which the driver genes are clustered into several ordered driver pathways. We develop an efficient inference algorithm that exhibits favorable scalability to the number of genes and samples compared to a previously introduced ILP-based method. Adopting a probabilistic approach also allows principled approaches to model selection and uncertainty quantification. Using a large set of experiments on synthetic datasets, we demonstrate our superior performance compared to the ILP-based method. We also analyze two biological datasets of colorectal and glioblastoma cancers. We emphasize that while the ILP-based method puts many seemingly passenger genes in the driver pathways, our algorithm keeps focused on truly driver genes and outputs more accurate models for cancer progression.Author summary: Cancer is a disease caused by the accumulation of somatic mutations in the genome. This process is mainly driven by mutations in certain genes that give the harboring cells some selective advantage. The rather few driver genes are usually masked amongst an abundance of so-called passenger mutations. Identification of the driver genes and the temporal order in which the mutations occur is of great importance towards research and clinical objectives. In this paper, we introduce a probabilistic model for cancer progression and devise an efficient inference algorithm to train the model. We show that our method scales favorably to large datasets and provides superior performance compared to an ILP-based counterpart on a wide set of synthetic data simulations. Our Bayesian approach also allows for systematic model selection and confidence quantification procedures in contrast to the previous non-probabilistic progression models. We also study two large datasets on colorectal and glioblastoma cancers and validate our inferred model in comparison to the ILP-based method.

Suggested Citation

  • Mohammadreza Mohaghegh Neyshabouri & Seong-Hwan Jun & Jens Lagergren, 2020. "Inferring tumor progression in large datasets," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-16, October.
  • Handle: RePEc:plo:pcbi00:1008183
    DOI: 10.1371/journal.pcbi.1008183
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008183
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008183&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008183?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hossein Shahrabi Farahani & Jens Lagergren, 2013. "Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-8, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sahand Khakabimamaghani & Dujian Ding & Oliver Snow & Martin Ester, 2019. "Uncovering the subtype-specific temporal order of cancer pathway dysregulation," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-19, November.
    2. Salim Akhter Chowdhury & Stanley E Shackney & Kerstin Heselmeyer-Haddad & Thomas Ried & Alejandro A Schäffer & Russell Schwartz, 2014. "Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics," PLOS Computational Biology, Public Library of Science, vol. 10(7), pages 1-19, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008183. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.