IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-34595-w.html
   My bibliography  Save this article

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Author

Listed:
  • Md Tauhidul Islam

    (Stanford University)

  • Jen-Yeu Wang

    (Stanford University)

  • Hongyi Ren

    (Stanford University)

  • Xiaomeng Li

    (Stanford University)

  • Masoud Badiei Khuzani

    (Stanford University)

  • Shengtian Sang

    (Stanford University)

  • Lequan Yu

    (Stanford University)

  • Liyue Shen

    (Stanford University)

  • Wei Zhao

    (Stanford University)

  • Lei Xing

    (Stanford University)

Abstract

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.

Suggested Citation

  • Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34595-w
    DOI: 10.1038/s41467-022-34595-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-34595-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-34595-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Adam L. Haber & Moshe Biton & Noga Rogel & Rebecca H. Herbst & Karthik Shekhar & Christopher Smillie & Grace Burgin & Toni M. Delorey & Michael R. Howitt & Yarden Katz & Itay Tirosh & Semir Beyaz & Da, 2017. "A single-cell survey of the small intestinal epithelium," Nature, Nature, vol. 551(7680), pages 333-339, November.
    2. Han-Ming Liu & Dan Yang & Zhao-Fa Liu & Sheng-Zhou Hu & Shen-Hai Yan & Xian-Wen He, 2019. "Density distribution of gene expression profiles and evaluation of using maximal information coefficient to identify differentially expressed genes," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-28, July.
    3. Alex K. Shalek & Rahul Satija & Joe Shuga & John J. Trombetta & Dave Gennert & Diana Lu & Peilin Chen & Rona S. Gertner & Jellert T. Gaublomme & Nir Yosef & Schraga Schwartz & Brian Fowler & Suzanne W, 2014. "Single-cell RNA-seq reveals dynamic paracrine control of cellular variation," Nature, Nature, vol. 510(7505), pages 363-369, June.
    4. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    5. Mor Nitzan & Nikos Karaiskos & Nir Friedman & Nikolaus Rajewsky, 2019. "Gene expression cartography," Nature, Nature, vol. 576(7785), pages 132-137, December.
    6. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    7. Xiaoping Han & Ziming Zhou & Lijiang Fei & Huiyu Sun & Renying Wang & Yao Chen & Haide Chen & Jingjing Wang & Huanna Tang & Wenhao Ge & Yincong Zhou & Fang Ye & Mengmeng Jiang & Junqing Wu & Yanyu Xia, 2020. "Construction of a human cell landscape at single-cell level," Nature, Nature, vol. 581(7808), pages 303-309, May.
    8. Barbara Treutlein & Doug G. Brownfield & Angela R. Wu & Norma F. Neff & Gary L. Mantalas & F. Hernan Espinoza & Tushar J. Desai & Mark A. Krasnow & Stephen R. Quake, 2014. "Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq," Nature, Nature, vol. 509(7500), pages 371-375, May.
    9. Wei Vivian Li & Jingyi Jessica Li, 2018. "An accurate and robust imputation method scImpute for single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    3. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    4. Lucy Xia & Christy Lee & Jingyi Jessica Li, 2024. "Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    5. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    6. Ming-Wen Hu & Dong Won Kim & Sheng Liu & Donald J Zack & Seth Blackshaw & Jiang Qian, 2019. "PanoView: An iterative clustering method for single-cell RNA sequencing data," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-17, August.
    7. Jing Qi & Yang Zhou & Zicen Zhao & Shuilin Jin, 2021. "SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-20, June.
    8. Xiang Lin & Tian Tian & Zhi Wei & Hakon Hakonarson, 2022. "Clustering of single-cell multi-omics data with a multimodal deep learning method," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    9. Ajita Shree & Musale Krushna Pavan & Hamim Zafar, 2023. "scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    10. George C. Linderman & Jun Zhao & Manolis Roulis & Piotr Bielecki & Richard A. Flavell & Boaz Nadler & Yuval Kluger, 2022. "Zero-preserving imputation of single-cell RNA-seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    11. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    12. Lei Xiong & Kang Tian & Yuzhe Li & Weixi Ning & Xin Gao & Qiangfeng Cliff Zhang, 2022. "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    13. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    14. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    15. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    16. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    17. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    18. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    19. A van Giessen & K G M Moons & G A de Wit & W M M Verschuren & J M A Boer & H Koffijberg, 2015. "Tailoring the Implementation of New Biomarkers Based on Their Added Predictive Value in Subgroups of Individuals," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-14, January.
    20. Ethan Bahl & Snehajyoti Chatterjee & Utsav Mukherjee & Muhammad Elsadany & Yann Vanrobaeys & Li-Chun Lin & Miriam McDonough & Jon Resch & K. Peter Giese & Ted Abel & Jacob J. Michaelson, 2024. "Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34595-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.