IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008749.html
   My bibliography  Save this article

Comprehensive analysis of cancer breakpoints reveals signatures of genetic and epigenetic contribution to cancer genome rearrangements

Author

Listed:
  • Kseniia Cheloshkina
  • Maria Poptsova

Abstract

Understanding mechanisms of cancer breakpoint mutagenesis is a difficult task and predictive models of cancer breakpoint formation have to this time failed to achieve even moderate predictive power. Here we take advantage of a machine learning approach that can gather important features from big data and quantify contribution of different factors. We performed comprehensive analysis of almost 630,000 cancer breakpoints and quantified the contribution of genomic and epigenomic features–non-B DNA structures, chromatin organization, transcription factor binding sites and epigenetic markers. The results showed that transcription and formation of non-B DNA structures are two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, DNA methylation, histone marks are less informative but do make their contribution. As a general trend, individual features inside the groups show a relatively high contribution of G-quadruplexes and repeats and CTCF, GABPA, RXRA, SP1, MAX and NR2F2 transcription factors. Overall, the cancer breakpoint landscape can be represented by well-predicted hotspots and poorly predicted individual breakpoints scattered across genomes. We demonstrated that hotspot mutagenesis has genomic and epigenomic factors, and not all individual cancer breakpoints are just random noise but have a definite mutation signature. Besides we found a long-range action of some features on breakpoint mutagenesis. Combining omics data, cancer-specific individual feature importance and adding the distant to local features, predictive models for cancer breakpoint formation achieved 70–90% ROC AUC for different cancer types; however precision remained low at 2% and the recall did not exceed 50%. On the one hand, the power of models strongly correlates with the size of available cancer breakpoint and epigenomic data, and on the other hand finding strong determinants of cancer breakpoint formation still remains a challenge. The strength of predictive signals of each group and of each feature inside a group can be converted into cancer-specific breakpoint mutation signatures. Overall our results add to the understanding of cancer genome rearrangement processes.Author summary: We analysed more than half a million breakpoints from all major cancer types and quantified contributions of genetic and epigenetic factors to cancer breakpoint mutagenesis. The results suggest that transcription and formation of non-B DNA structures are the two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, histone marks are less informative while still contributive. Despite the common trends, each cancer type has its own peculiarities. Breakpoint hotspots in brain can be predicted by distribution of non-B DNA structures, those in liver by transcription factor binding sites, those in blood by non-B DNA structures and promoter regions. Cancer breakpoint landscape can be viewed as hotspots and individual breakpoints scattered all over the genome. Hotspots have distinct genomic and epigenomic signatures with relative contribution varied for different cancer types. Individual cancer breakpoints are the mixture of random noise and breakpoints with a recognizable mutation signature. Quantifying contribution of different factors to cancer breakpoint mutagenesis for individual cancer genomes will enhance our understanding of individual mechanisms of cancer genome rearrangement.

Suggested Citation

  • Kseniia Cheloshkina & Maria Poptsova, 2021. "Comprehensive analysis of cancer breakpoints reveals signatures of genetic and epigenetic contribution to cancer genome rearrangements," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-23, March.
  • Handle: RePEc:plo:pcbi00:1008749
    DOI: 10.1371/journal.pcbi.1008749
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008749
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008749&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008749?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yu Amanda Guo & Mei Mei Chang & Weitai Huang & Wen Fong Ooi & Manjie Xing & Patrick Tan & Anders Jacobsen Skanderup, 2018. "Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers," Nature Communications, Nature, vol. 9(1), pages 1-14, December.
    2. Fran Supek & Ben Lehner, 2015. "Differential DNA mismatch repair underlies mutation rate variation across the human genome," Nature, Nature, vol. 521(7550), pages 81-84, May.
    3. Paz Polak & Rosa Karlić & Amnon Koren & Robert Thurman & Richard Sandstrom & Michael S. Lawrence & Alex Reynolds & Eric Rynes & Kristian Vlahoviček & John A. Stamatoyannopoulos & Shamil R. Sunyaev, 2015. "Cell-of-origin chromatin organization shapes the mutational landscape of cancer," Nature, Nature, vol. 518(7539), pages 360-364, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Albert Stuart Reece & Gary Kenneth Hulse, 2022. "Epidemiology of Δ8THC-Related Carcinogenesis in USA: A Panel Regression and Causal Inferential Study," IJERPH, MDPI, vol. 19(13), pages 1-27, June.
    2. Alexander Martinez-Fundichely & Austin Dixon & Ekta Khurana, 2022. "Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michelle Dietzen & Haoran Zhai & Olivia Lucas & Oriol Pich & Christopher Barrington & Wei-Ting Lu & Sophia Ward & Yanping Guo & Robert E. Hynds & Simone Zaccaria & Charles Swanton & Nicholas McGranaha, 2024. "Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    2. Miguel M. Álvarez & Josep Biayna & Fran Supek, 2022. "TP53-dependent toxicity of CRISPR/Cas9 cuts is differential across genomic loci and can confound genetic screening," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Evelyn Kabirova & Anastasiya Ryzhkova & Varvara Lukyanchikova & Anna Khabarova & Alexey Korablev & Tatyana Shnaider & Miroslav Nuriddinov & Polina Belokopytova & Alexander Smirnov & Nikita V. Khotskin, 2024. "TAD border deletion at the Kit locus causes tissue-specific ectopic activation of a neighboring gene," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    4. Kiran Krishnamachari & Dylan Lu & Alexander Swift-Scott & Anuar Yeraliyev & Kayla Lee & Weitai Huang & Sim Ngak Leng & Anders Jacobsen Skanderup, 2022. "Accurate somatic variant detection using weakly supervised deep learning," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    5. Michael Habig & Cecile Lorrain & Alice Feurtey & Jovan Komluski & Eva H. Stukenbrock, 2021. "Epigenetic modifications affect the rate of spontaneous mutations in a pathogenic fungus," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    6. Koon-Kiu Yan & Shaoke Lou & Mark Gerstein, 2017. "MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions," PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-22, July.
    7. Alexander Martinez-Fundichely & Austin Dixon & Ekta Khurana, 2022. "Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    8. Paola Cornejo-Páramo & Veronika Petrova & Xuan Zhang & Robert S. Young & Emily S. Wong, 2024. "Emergence of enhancers at late DNA replicating regions," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    9. Sheng Wang & Sebastian O. Klein & Sylvia Urban & Maximilian Staudt & Nicolas P. F. Barthes & Dominica Willmann & Johannes Bacher & Manuela Sum & Helena Bauer & Ling Peng & Georg A. Rennar & Christian , 2024. "Structure-guided design of a selective inhibitor of the methyltransferase KMT9 with cellular activity," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    10. Luan Nguyen & Arne Hoeck & Edwin Cuppen, 2022. "Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Gongwang Yu & Yao Liu & Zizhang Li & Shuyun Deng & Zhuoxing Wu & Xiaoyu Zhang & Wenbo Chen & Junnan Yang & Xiaoshu Chen & Jian-Rong Yang, 2023. "Genome-wide probing of eukaryotic nascent RNA structure elucidates cotranscriptional folding and its antimutagenic effect," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    12. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008749. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.