IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-26938-w.html
   My bibliography  Save this article

Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo

Author

Listed:
  • David Lähnemann

    (Helmholtz Centre for Infection Research
    Technische Universität Braunschweig
    Heinrich Heine University Düsseldorf
    University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Johannes Köster

    (University of Duisburg-Essen
    Centrum Wiskunde & Informatica)

  • Ute Fischer

    (University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Arndt Borkhardt

    (University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Alice C. McHardy

    (Helmholtz Centre for Infection Research
    Technische Universität Braunschweig
    Heinrich Heine University Düsseldorf)

  • Alexander Schönhuth

    (Centrum Wiskunde & Informatica
    Bielefeld University)

Abstract

Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable—because computationally efficient—manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo

Suggested Citation

  • David Lähnemann & Johannes Köster & Ute Fischer & Arndt Borkhardt & Alice C. McHardy & Alexander Schönhuth, 2021. "Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26938-w
    DOI: 10.1038/s41467-021-26938-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-26938-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-26938-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brandon Milholland & Xiao Dong & Lei Zhang & Xiaoxiao Hao & Yousin Suh & Jan Vijg, 2017. "Differences between germline and somatic mutation rates in humans and mice," Nature Communications, Nature, vol. 8(1), pages 1-8, August.
    2. Yong Wang & Jill Waters & Marco L. Leung & Anna Unruh & Whijae Roh & Xiuqing Shi & Ken Chen & Paul Scheet & Selina Vattathil & Han Liang & Asha Multani & Hong Zhang & Rui Zhao & Franziska Michor & Fun, 2014. "Clonal evolution in breast cancer revealed by single nucleus genome sequencing," Nature, Nature, vol. 512(7513), pages 155-160, August.
    3. Salem Malikic & Katharina Jahn & Jack Kuipers & S. Cenk Sahinalp & Niko Beerenwinkel, 2019. "Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data," Nature Communications, Nature, vol. 10(1), pages 1-12, December.
    4. Gang Peng & Yu Fan & Wenyi Wang, 2014. "FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-6, October.
    5. Jochen Singer & Jack Kuipers & Katharina Jahn & Niko Beerenwinkel, 2018. "Single-cell mutation identification via phylogenetic inference," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    6. Peter Muller & Giovanni Parmigiani & Christian Robert & Judith Rousseau, 2004. "Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 990-1001, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Haochen Zhang & Elias-Ramzey Karnoub & Shigeaki Umeda & Ronan Chaligné & Ignas Masilionis & Caitlin A. McIntyre & Palash Sashittal & Akimasa Hayashi & Amanda Zucker & Katelyn Mullen & Jungeui Hong & A, 2023. "Application of high-throughput single-nucleus DNA sequencing in pancreatic cancer," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    2. Wei Sun & Chong Jin & Jonathan A. Gelfond & Ming‐Hui Chen & Joseph G. Ibrahim, 2020. "Joint analysis of single‐cell and bulk tissue sequencing data to infer intratumor heterogeneity," Biometrics, The International Biometric Society, vol. 76(3), pages 983-994, September.
    3. Humberto Contreras-Trujillo & Jiya Eerdeng & Samir Akre & Du Jiang & Jorge Contreras & Basia Gala & Mary C. Vergel-Rodriguez & Yeachan Lee & Aparna Jorapur & Areen Andreasian & Lisa Harton & Charles S, 2021. "Deciphering intratumoral heterogeneity using integrated clonal tracking and single-cell transcriptome analyses," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    4. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    5. Gómez-Villegas Miguel A. & Sanz Luis & Salazar Isabel, 2014. "A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 49-65, February.
    6. Zichen Ma & Shannon W. Davis & Yen‐Yi Ho, 2023. "Flexible copula model for integrating correlated multi‐omics data from single‐cell experiments," Biometrics, The International Biometric Society, vol. 79(2), pages 1559-1572, June.
    7. Xiang Ge Luo & Jack Kuipers & Niko Beerenwinkel, 2023. "Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    8. Jinhyun Kim & Sungsik Kim & Huiran Yeom & Seo Woo Song & Kyoungseob Shin & Sangwook Bae & Han Suk Ryu & Ji Young Kim & Ahyoun Choi & Sumin Lee & Taehoon Ryu & Yeongjae Choi & Hamin Kim & Okju Kim & Yu, 2023. "Barcoded multiple displacement amplification for high coverage sequencing in spatial genomics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    9. Brian M Lang & Jack Kuipers & Benjamin Misselwitz & Niko Beerenwinkel, 2020. "Predicting colorectal cancer risk from adenoma detection via a two-type branching process model," PLOS Computational Biology, Public Library of Science, vol. 16(2), pages 1-23, February.
    10. Michele Guindani & Wesley O. Johnson, 2018. "More nonparametric Bayesian inference in applications," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(2), pages 239-251, June.
    11. Jae-Woong Min & Woo Jin Kim & Jeong A Han & Yu-Jin Jung & Kyu-Tae Kim & Woong-Yang Park & Hae-Ock Lee & Sun Shim Choi, 2015. "Identification of Distinct Tumor Subpopulations in Lung Adenocarcinoma via Single-Cell RNA-seq," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-17, August.
    12. Xiaodong Liu & Ke Zhang & Neslihan A. Kaya & Zhe Jia & Dafei Wu & Tingting Chen & Zhiyuan Liu & Sinan Zhu & Axel M. Hillmer & Torsten Wuestefeld & Jin Liu & Yun Shen Chan & Zheng Hu & Liang Ma & Li Ji, 2024. "Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    13. Elisa C. J. Maria & Isabel Salazar & Luis Sanz & Miguel A. Gómez-Villegas, 2020. "Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation," Mathematics, MDPI, vol. 8(9), pages 1-22, September.
    14. Zhaoyang Tian & Kun Liang & Pengfei Li, 2021. "A powerful procedure that controls the false discovery rate with directional information," Biometrics, The International Biometric Society, vol. 77(1), pages 212-222, March.
    15. Zhengcheng He & Ryan Ghorayeb & Susanna Tan & Ke Chen & Amanda C. Lorentzian & Jack Bottyan & Syed Mohammed Musheer Aalam & Miguel Angel Pujana & Philipp F. Lange & Nagarajan Kannan & Connie J. Eaves , 2022. "Pathogenic BRCA1 variants disrupt PLK1-regulation of mitotic spindle orientation," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    16. Thanh Nguyen & Asim Bhatti & Samuel Yang & Saeid Nahavandi, 2016. "RNA-Seq Count Data Modelling by Grey Relational Analysis and Nonparametric Gaussian Process," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-18, October.
    17. Bickel David R., 2008. "Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-27, March.
    18. Cohen, Arthur & Sackrowitz, Harold B., 2007. "More on the inadmissibility of step-up," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 481-492, March.
    19. Willi Maurer & Frank Bretz & Xiaolei Xun, 2023. "Optimal test procedures for multiple hypotheses controlling the familywise expected loss," Biometrics, The International Biometric Society, vol. 79(4), pages 2781-2793, December.
    20. Seong-Hwan Jun & Hosein Toosi & Jeff Mold & Camilla Engblom & Xinsong Chen & Ciara O’Flanagan & Michael Hagemann-Jensen & Rickard Sandberg & Samuel Aparicio & Johan Hartman & Andrew Roth & Jens Lagerg, 2023. "Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26938-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.