IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-34595-w.html
   My bibliography  Save this article

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Author

Listed:
  • Md Tauhidul Islam

    (Stanford University)

  • Jen-Yeu Wang

    (Stanford University)

  • Hongyi Ren

    (Stanford University)

  • Xiaomeng Li

    (Stanford University)

  • Masoud Badiei Khuzani

    (Stanford University)

  • Shengtian Sang

    (Stanford University)

  • Lequan Yu

    (Stanford University)

  • Liyue Shen

    (Stanford University)

  • Wei Zhao

    (Stanford University)

  • Lei Xing

    (Stanford University)

Abstract

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.

Suggested Citation

  • Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34595-w
    DOI: 10.1038/s41467-022-34595-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-34595-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-34595-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Adam L. Haber & Moshe Biton & Noga Rogel & Rebecca H. Herbst & Karthik Shekhar & Christopher Smillie & Grace Burgin & Toni M. Delorey & Michael R. Howitt & Yarden Katz & Itay Tirosh & Semir Beyaz & Da, 2017. "A single-cell survey of the small intestinal epithelium," Nature, Nature, vol. 551(7680), pages 333-339, November.
    3. Han-Ming Liu & Dan Yang & Zhao-Fa Liu & Sheng-Zhou Hu & Shen-Hai Yan & Xian-Wen He, 2019. "Density distribution of gene expression profiles and evaluation of using maximal information coefficient to identify differentially expressed genes," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-28, July.
    4. Xiaoping Han & Ziming Zhou & Lijiang Fei & Huiyu Sun & Renying Wang & Yao Chen & Haide Chen & Jingjing Wang & Huanna Tang & Wenhao Ge & Yincong Zhou & Fang Ye & Mengmeng Jiang & Junqing Wu & Yanyu Xia, 2020. "Construction of a human cell landscape at single-cell level," Nature, Nature, vol. 581(7808), pages 303-309, May.
    5. Barbara Treutlein & Doug G. Brownfield & Angela R. Wu & Norma F. Neff & Gary L. Mantalas & F. Hernan Espinoza & Tushar J. Desai & Mark A. Krasnow & Stephen R. Quake, 2014. "Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq," Nature, Nature, vol. 509(7500), pages 371-375, May.
    6. Mor Nitzan & Nikos Karaiskos & Nir Friedman & Nikolaus Rajewsky, 2019. "Gene expression cartography," Nature, Nature, vol. 576(7785), pages 132-137, December.
    7. Alex K. Shalek & Rahul Satija & Joe Shuga & John J. Trombetta & Dave Gennert & Diana Lu & Peilin Chen & Rona S. Gertner & Jellert T. Gaublomme & Nir Yosef & Schraga Schwartz & Brian Fowler & Suzanne W, 2014. "Single-cell RNA-seq reveals dynamic paracrine control of cellular variation," Nature, Nature, vol. 510(7505), pages 363-369, June.
    8. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    9. Wei Vivian Li & Jingyi Jessica Li, 2018. "An accurate and robust imputation method scImpute for single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    2. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    3. Lucy Xia & Christy Lee & Jingyi Jessica Li, 2024. "Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    4. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    5. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    6. Ming-Wen Hu & Dong Won Kim & Sheng Liu & Donald J Zack & Seth Blackshaw & Jiang Qian, 2019. "PanoView: An iterative clustering method for single-cell RNA sequencing data," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-17, August.
    7. Ajita Shree & Musale Krushna Pavan & Hamim Zafar, 2023. "scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    8. Jing Qi & Yang Zhou & Zicen Zhao & Shuilin Jin, 2021. "SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-20, June.
    9. Zhenchao Tang & Guanxing Chen & Shouzhi Chen & Jianhua Yao & Linlin You & Calvin Yu-Chian Chen, 2024. "Modal-nexus auto-encoder for multi-modality cellular data integration and imputation," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    10. George C. Linderman & Jun Zhao & Manolis Roulis & Piotr Bielecki & Richard A. Flavell & Boaz Nadler & Yuval Kluger, 2022. "Zero-preserving imputation of single-cell RNA-seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    11. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    12. Lei Xiong & Kang Tian & Yuzhe Li & Weixi Ning & Xin Gao & Qiangfeng Cliff Zhang, 2022. "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    13. Xiang Lin & Tian Tian & Zhi Wei & Hakon Hakonarson, 2022. "Clustering of single-cell multi-omics data with a multimodal deep learning method," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    14. Jingyang Qian & Hudong Bao & Xin Shao & Yin Fang & Jie Liao & Zhuo Chen & Chengyu Li & Wenbo Guo & Yining Hu & Anyao Li & Yue Yao & Xiaohui Fan & Yiyu Cheng, 2024. "Simulating multiple variability in spatially resolved transcriptomics with scCube," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    15. Miriam Aparicio, 2021. "Resiliency and Cooperation or Regarding Social and Collective Competencies for University Achievement. An Analysis from a Systemic Perspective," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, ejser_v8_.
    16. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    17. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    18. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    19. F. Marta L. Di Lascio & Andrea Menapace & Roberta Pappadà, 2024. "A spatially‐weighted AMH copula‐based dissimilarity measure for clustering variables: An application to urban thermal efficiency," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    20. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34595-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.