IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008270.html
   My bibliography  Save this article

Epiclomal: Probabilistic clustering of sparse single-cell DNA methylation data

Author

Listed:
  • Camila P. E. de Souza
  • Mirela Andronescu
  • Tehmina Masud
  • Farhia Kabeer
  • Justina Biele
  • Emma Laks
  • Daniel Lai
  • Patricia Ye
  • Jazmine Brimhall
  • Beixi Wang
  • Edmund Su
  • Tony Hui
  • Qi Cao
  • Marcus Wong
  • Michelle Moksa
  • Richard A Moore
  • Martin Hirst
  • Samuel Aparicio
  • Sohrab P Shah

Abstract

We present Epiclomal, a probabilistic clustering method arising from a hierarchical mixture model to simultaneously cluster sparse single-cell DNA methylation data and impute missing values. Using synthetic and published single-cell CpG datasets, we show that Epiclomal outperforms non-probabilistic methods and can handle the inherent missing data characteristic that dominates single-cell CpG genome sequences. Using newly generated single-cell 5mCpG sequencing data, we show that Epiclomal discovers sub-clonal methylation patterns in aneuploid tumour genomes, thus defining epiclones that can match or transcend copy number-determined clonal lineages and opening up an important form of clonal analysis in cancer. Epiclomal is written in R and Python and is available at https://github.com/shahcompbio/Epiclomal.Author summary: DNA methylation is an epigenetic mark that occurs when methyl groups are attached to the DNA molecule, thereby playing decisive roles in numerous biological processes. Advances in technology have allowed the generation of high-throughput DNA methylation sequencing data from single cells. One of the goals is to group cells according to their DNA methylation profiles; however, a major challenge arises due to a large amount of missing data per cell. To address this problem, we developed a novel statistical model and framework: Epiclomal. Our approach uses a hierarchical mixture model to borrow statistical strength across cells and neighboring loci to accurately define cell groups (clusters). We compare our approach to different methods on both synthetic and published datasets. We show that Epiclomal is more robust than other approaches, producing more accurate clusters of cells in the majority of experimental scenarios. We also apply Epiclomal to newly generated single-cell DNA methylation data from breast cancer xenografts. Our results show that methylation-based clusters can mirror or in some instances transcend the clusters defined by single-cell copy number analysis. This illustrates the importance of single-cell DNA methylation analysis in understanding cellular heterogeneity in cancer.

Suggested Citation

  • Camila P. E. de Souza & Mirela Andronescu & Tehmina Masud & Farhia Kabeer & Justina Biele & Emma Laks & Daniel Lai & Patricia Ye & Jazmine Brimhall & Beixi Wang & Edmund Su & Tony Hui & Qi Cao & Marcu, 2020. "Epiclomal: Probabilistic clustering of sparse single-cell DNA methylation data," PLOS Computational Biology, Public Library of Science, vol. 16(9), pages 1-24, September.
  • Handle: RePEc:plo:pcbi00:1008270
    DOI: 10.1371/journal.pcbi.1008270
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008270
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008270&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008270?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nicholas Navin & Jude Kendall & Jennifer Troge & Peter Andrews & Linda Rodgers & Jeanne McIndoo & Kerry Cook & Asya Stepansky & Dan Levy & Diane Esposito & Lakshmi Muthuswamy & Alex Krasnitz & W. Rich, 2011. "Tumour evolution inferred by single-cell sequencing," Nature, Nature, vol. 472(7341), pages 90-94, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Adam C. Weiner & Marc J. Williams & Hongyu Shi & Ignacio Vázquez-García & Sohrab Salehi & Nicole Rusk & Samuel Aparicio & Sohrab P. Shah & Andrew McPherson, 2024. "Inferring replication timing and proliferation dynamics from single-cell DNA sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Jinhyun Kim & Sungsik Kim & Huiran Yeom & Seo Woo Song & Kyoungseob Shin & Sangwook Bae & Han Suk Ryu & Ji Young Kim & Ahyoun Choi & Sumin Lee & Taehoon Ryu & Yeongjae Choi & Hamin Kim & Okju Kim & Yu, 2023. "Barcoded multiple displacement amplification for high coverage sequencing in spatial genomics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    3. Noushin Niknafs & Violeta Beleva-Guthrie & Daniel Q Naiman & Rachel Karchin, 2015. "SubClonal Hierarchy Inference from Somatic Mutations: Automatic Reconstruction of Cancer Evolutionary Trees from Multi-region Next Generation Sequencing," PLOS Computational Biology, Public Library of Science, vol. 11(10), pages 1-26, October.
    4. Yidong Zhou & Changjun Wang & Hanjiang Zhu & Yan Lin & Bo Pan & Xiaohui Zhang & Xin Huang & Qianqian Xu & Yali Xu & Qiang Sun, 2016. "Diagnostic Accuracy of PIK3CA Mutation Detection by Circulating Free DNA in Breast Cancer: A Meta-Analysis of Diagnostic Test Accuracy," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-15, June.
    5. Claudia Bühnemann & Simon Li & Haiyue Yu & Harriet Branford White & Karl L Schäfer & Antonio Llombart-Bosch & Isidro Machado & Piero Picci & Pancras C W Hogendoorn & Nicholas A Athanasou & J Alison No, 2014. "Quantification of the Heterogeneity of Prognostic Cellular Biomarkers in Ewing Sarcoma Using Automated Image and Random Survival Forest Analysis," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-14, September.
    6. Chunyang Bao & Richard W. Tourdot & Gregory J. Brunette & Chip Stewart & Lili Sun & Hideo Baba & Masayuki Watanabe & Agoston T. Agoston & Kunal Jajoo & Jon M. Davison & Katie S. Nason & Gad Getz & Ken, 2023. "Genomic signatures of past and present chromosomal instability in Barrett’s esophagus and early esophageal adenocarcinoma," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    7. Xian F Mallory & Mohammadamin Edrisi & Nicholas Navin & Luay Nakhleh, 2020. "Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data," PLOS Computational Biology, Public Library of Science, vol. 16(7), pages 1-24, July.
    8. Udit Singhal & Srinivas Nallandhighal & Jeffrey J. Tosoian & Kevin Hu & Trinh M. Pham & Judith Stangl-Kremser & Chia-Jen Liu & Razeen Karim & Komal R. Plouffe & Todd M. Morgan & Marcin Cieslik & Rober, 2024. "Integrative multi-region molecular profiling of primary prostate cancer in men with synchronous lymph node metastasis," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    9. Joanna Hård & Jeff E. Mold & Jesper Eisfeldt & Christian Tellgren-Roth & Susana Häggqvist & Ignas Bunikis & Orlando Contreras-Lopez & Chen-Shan Chin & Jessica Nordlund & Carl-Johan Rubin & Lars Feuk &, 2023. "Long-read whole-genome analysis of human single cells," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    10. Noa Chapal-Ilani & Yosef E Maruvka & Adam Spiro & Yitzhak Reizel & Rivka Adar & Liran I Shlush & Ehud Shapiro, 2013. "Comparing Algorithms That Reconstruct Cell Lineage Trees Utilizing Information on Microsatellite Mutations," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-17, November.
    11. Salim Akhter Chowdhury & Stanley E Shackney & Kerstin Heselmeyer-Haddad & Thomas Ried & Alejandro A Schäffer & Russell Schwartz, 2014. "Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics," PLOS Computational Biology, Public Library of Science, vol. 10(7), pages 1-19, July.
    12. Brandon Monier & Adam McDermaid & Cankun Wang & Jing Zhao & Allison Miller & Anne Fennell & Qin Ma, 2019. "IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-15, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008270. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.