IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-45891-y.html
   My bibliography  Save this article

Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters

Author

Listed:
  • Lucy Xia

    (Hong Kong University of Science and Technology)

  • Christy Lee

    (University of California, Los Angeles)

  • Jingyi Jessica Li

    (University of California, Los Angeles
    University of California, Los Angeles
    University of California, Los Angeles
    University of California, Los Angeles)

Abstract

Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP’s 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell’s 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.

Suggested Citation

  • Lucy Xia & Christy Lee & Jingyi Jessica Li, 2024. "Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45891-y
    DOI: 10.1038/s41467-024-45891-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-45891-y
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-45891-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Adam L. Haber & Moshe Biton & Noga Rogel & Rebecca H. Herbst & Karthik Shekhar & Christopher Smillie & Grace Burgin & Toni M. Delorey & Michael R. Howitt & Yarden Katz & Itay Tirosh & Semir Beyaz & Da, 2017. "A single-cell survey of the small intestinal epithelium," Nature, Nature, vol. 551(7680), pages 333-339, November.
    3. Gioele La Manno & Ruslan Soldatov & Amit Zeisel & Emelie Braun & Hannah Hochgerner & Viktor Petukhov & Katja Lidschreiber & Maria E. Kastriti & Peter Lönnerberg & Alessandro Furlan & Jean Fan & Lars E, 2018. "RNA velocity of single cells," Nature, Nature, vol. 560(7719), pages 494-498, August.
    4. Jiarui Ding & Aviv Regev, 2021. "Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    5. Maximilian Strunz & Lukas M. Simon & Meshal Ansari & Jaymin J. Kathiriya & Ilias Angelidis & Christoph H. Mayr & George Tsidiridis & Marius Lange & Laura F. Mattner & Min Yee & Paulina Ogar & Arunima , 2020. "Alveolar regeneration through a Krt8+ transitional stem cell state that persists in human lung fibrosis," Nature Communications, Nature, vol. 11(1), pages 1-20, December.
    6. Dmitry Kobak & Philipp Berens, 2019. "The art of using t-SNE for single-cell transcriptomics," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    7. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    8. Anna C. Belkina & Christopher O. Ciccolella & Rina Anno & Richard Halpert & Josef Spidlen & Jennifer E. Snyder-Cappione, 2019. "Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets," Nature Communications, Nature, vol. 10(1), pages 1-12, December.
    9. Baolin Liu & Chenwei Li & Ziyi Li & Dongfang Wang & Xianwen Ren & Zemin Zhang, 2020. "An entropy-based metric for assessing the purity of single cell populations," Nature Communications, Nature, vol. 11(1), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    2. Yanchuan Li & Huamei Li & Cheng Peng & Ge Meng & Yijun Lu & Honglin Liu & Li Cui & Huan Zhou & Zhu Xu & Lingyun Sun & Lihong Liu & Qing Xiong & Beicheng Sun & Shiping Jiao, 2024. "Unraveling the spatial organization and development of human thymocytes through integration of spatial transcriptomics and single-cell multi-omics profiling," Nature Communications, Nature, vol. 15(1), pages 1-25, December.
    3. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    4. Ajita Shree & Musale Krushna Pavan & Hamim Zafar, 2023. "scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    5. Jia Li & Alan J. Simmons & Caroline V. Hawkins & Sophie Chiron & Marisol A. Ramirez-Solano & Naila Tasneem & Harsimran Kaur & Yanwen Xu & Frank Revetta & Paige N. Vega & Shunxing Bao & Can Cui & Regin, 2024. "Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    6. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    7. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    8. Vidhya M. Ravi & Nicolas Neidert & Paulina Will & Kevin Joseph & Julian P. Maier & Jan Kückelhaus & Lea Vollmer & Jonathan M. Goeldner & Simon P. Behringer & Florian Scherer & Melanie Boerries & Marie, 2022. "T-cell dysfunction in the glioblastoma microenvironment is mediated by myeloid cells releasing interleukin-10," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    9. Andrea Riba & Attila Oravecz & Matej Durik & Sara Jiménez & Violaine Alunni & Marie Cerciat & Matthieu Jung & Céline Keime & William M. Keyes & Nacho Molina, 2022. "Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    10. Dalia Hassan & Jichao Chen, 2024. "CEBPA restricts alveolar type 2 cell plasticity during development and injury-repair," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    11. Xinyi Zhang & Xiao Wang & G. V. Shivashankar & Caroline Uhler, 2022. "Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    12. Xiang Lin & Tian Tian & Zhi Wei & Hakon Hakonarson, 2022. "Clustering of single-cell multi-omics data with a multimodal deep learning method," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    13. Yuanyuan Chen & Reka Toth & Sara Chocarro & Dieter Weichenhan & Joschka Hey & Pavlo Lutsik & Stefan Sawall & Georgios T. Stathopoulos & Christoph Plass & Rocio Sotillo, 2022. "Club cells employ regeneration mechanisms during lung tumorigenesis," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    14. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    15. Christopher W. Murray & Jennifer J. Brady & Mingqi Han & Hongchen Cai & Min K. Tsai & Sarah E. Pierce & Ran Cheng & Janos Demeter & David M. Feldser & Peter K. Jackson & David B. Shackelford & Monte M, 2022. "LKB1 drives stasis and C/EBP-mediated reprogramming to an alveolar type II fate in lung cancer," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    16. Leila R. Martins & Lina Sieverling & Michelle Michelhans & Chiara Schiller & Cihan Erkut & Thomas G. P. Grünewald & Sergio Triana & Stefan Fröhling & Lars Velten & Hanno Glimm & Claudia Scholl, 2024. "Single-cell division tracing and transcriptomics reveal cell types and differentiation paths in the regenerating lung," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    17. Miriam Aparicio, 2021. "Resiliency and Cooperation or Regarding Social and Collective Competencies for University Achievement. An Analysis from a Systemic Perspective," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, ejser_v8_.
    18. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    19. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    20. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45891-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.