IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-39923-2.html
   My bibliography  Save this article

Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2

Author

Listed:
  • Yingxin Lin

    (The University of Sydney
    The University of Sydney
    The University of Sydney
    Laboratory of Data Discovery for Health Limited (D24H))

  • Yue Cao

    (The University of Sydney
    The University of Sydney
    The University of Sydney
    Laboratory of Data Discovery for Health Limited (D24H))

  • Elijah Willie

    (The University of Sydney)

  • Ellis Patrick

    (The University of Sydney
    The University of Sydney
    Laboratory of Data Discovery for Health Limited (D24H)
    The University of Sydney)

  • Jean Y. H. Yang

    (The University of Sydney
    The University of Sydney
    The University of Sydney
    Laboratory of Data Discovery for Health Limited (D24H))

Abstract

The recent emergence of multi-sample multi-condition single-cell multi-cohort studies allows researchers to investigate different cell states. The effective integration of multiple large-cohort studies promises biological insights into cells under different conditions that individual studies cannot provide. Here, we present scMerge2, a scalable algorithm that allows data integration of atlas-scale multi-sample multi-condition single-cell studies. We have generalized scMerge2 to enable the merging of millions of cells from single-cell studies generated by various single-cell technologies. Using a large COVID-19 data collection with over five million cells from 1000+ individuals, we demonstrate that scMerge2 enables multi-sample multi-condition scRNA-seq data integration from multiple cohorts and reveals signatures derived from cell-type expression that are more accurate in discriminating disease progression. Further, we demonstrate that scMerge2 can remove dataset variability in CyTOF, imaging mass cytometry and CITE-seq experiments, demonstrating its applicability to a broad spectrum of single-cell profiling technologies.

Suggested Citation

  • Yingxin Lin & Yue Cao & Elijah Willie & Ellis Patrick & Jean Y. H. Yang, 2023. "Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39923-2
    DOI: 10.1038/s41467-023-39923-2
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-39923-2
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-39923-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Orit Rozenblatt-Rosen & Michael J. T. Stubbington & Aviv Regev & Sarah A. Teichmann, 2017. "The Human Cell Atlas: from vision to reality," Nature, Nature, vol. 550(7677), pages 451-453, October.
    2. Xiangjie Li & Kui Wang & Yafei Lyu & Huize Pan & Jingxiao Zhang & Dwight Stambolian & Katalin Susztak & Muredach P. Reilly & Gang Hu & Mingyao Li, 2020. "Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    3. André F. Rendeiro & Hiranmayi Ravichandran & Yaron Bram & Vasuretha Chandar & Junbum Kim & Cem Meydan & Jiwoon Park & Jonathan Foox & Tyler Hether & Sarah Warren & Youngmi Kim & Jason Reeves & Steven , 2021. "The spatial landscape of lung pathology during COVID-19 progression," Nature, Nature, vol. 593(7860), pages 564-569, May.
    4. Taiyun Kim & Owen Tang & Stephen T. Vernon & Katharine A. Kott & Yen Chin Koay & John Park & David E. James & Stuart M. Grieve & Terence P. Speed & Pengyi Yang & Gemma A. Figtree & John F. O’Sullivan , 2021. "A hierarchical approach to removal of unwanted variation for large-scale metabolomics data," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ajita Shree & Musale Krushna Pavan & Hamim Zafar, 2023. "scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    2. Qihuang Zhang & Shunzhou Jiang & Amelia Schroeder & Jian Hu & Kejie Li & Baohong Zhang & David Dai & Edward B. Lee & Rui Xiao & Mingyao Li, 2023. "Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    3. Christoph Ziegenhain & Rickard Sandberg, 2021. "BAMboozle removes genetic variation from human sequence data for open data sharing," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    4. Katharina T. Schmid & Barbara Höllbacher & Cristiana Cruceanu & Anika Böttcher & Heiko Lickert & Elisabeth B. Binder & Fabian J. Theis & Matthias Heinig, 2021. "scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    5. Kaichen Xu & Yan Lu & Suyang Hou & Kainan Liu & Yihang Du & Mengqian Huang & Hao Feng & Hao Wu & Xiaobo Sun, 2024. "Detecting anomalous anatomic regions in spatial transcriptomics with STANDS," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    6. Aiko Sekita & Hiroshi Kawasaki & Ayano Fukushima-Nomura & Kiyoshi Yashiro & Keiji Tanese & Susumu Toshima & Koichi Ashizaki & Tomohiro Miyai & Junshi Yazaki & Atsuo Kobayashi & Shinichi Namba & Tatsuh, 2023. "Multifaceted analysis of cross-tissue transcriptomes reveals phenotype–endotype associations in atopic dermatitis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    7. Yongjie Deng & Yao Yao & Yanni Wang & Tiantian Yu & Wenhao Cai & Dingli Zhou & Feng Yin & Wanli Liu & Yuying Liu & Chuanbo Xie & Jian Guan & Yumin Hu & Peng Huang & Weizhong Li, 2024. "An end-to-end deep learning method for mass spectrometry data analysis to reveal disease-specific metabolic profiles," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    8. John Arevalo & Ellen Su & Jessica D. Ewald & Robert Dijk & Anne E. Carpenter & Shantanu Singh, 2024. "Evaluating batch correction methods for image-based cell profiling," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    9. Luisa Santus & Maria Sopena-Rios & Raquel García-Pérez & Aaron E. Lin & Gordon C. Adams & Kayla G. Barnes & Katherine J. Siddle & Shirlee Wohl & Ferran Reverter & John L. Rinn & Richard S. Bennett & L, 2023. "Single-cell profiling of lncRNA expression during Ebola virus infection in rhesus macaques," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Xiaokang Yu & Xinyi Xu & Jingxiao Zhang & Xiangjie Li, 2023. "Batch alignment of single-cell transcriptomics data using deep metric learning," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    11. Rongbo Shen & Lin Liu & Zihan Wu & Ying Zhang & Zhiyuan Yuan & Junfu Guo & Fan Yang & Chao Zhang & Bichao Chen & Wanwan Feng & Chao Liu & Jing Guo & Guozhen Fan & Yong Zhang & Yuxiang Li & Xun Xu & Ji, 2022. "Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    12. Licai Huang & Paul Little & Jeroen R. Huyghe & Qian Shi & Tabitha A. Harrison & Greg Yothers & Thomas J. George & Ulrike Peters & Andrew T. Chan & Polly A. Newcomb & Wei Sun, 2021. "A Statistical Method for Association Analysis of Cell Type Compositions," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 373-385, December.
    13. Ruofei Lin & Xiaoli Hu & Shijun Chen & Junpei Huang, 2022. "Sports Participation and Anti-Epidemic: Empirical Evidence on the Influence of Regular Physical Activity on the COVID-19 Pandemic in Mainland China," IJERPH, MDPI, vol. 19(17), pages 1-13, August.
    14. Andrew Jones & Diana Cai & Didong Li & Barbara E. Engelhardt, 2024. "Optimizing the design of spatial genomic studies," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    15. Jingyang Qian & Jie Liao & Ziqi Liu & Ying Chi & Yin Fang & Yanrong Zheng & Xin Shao & Bingqi Liu & Yongjin Cui & Wenbo Guo & Yining Hu & Hudong Bao & Penghui Yang & Qian Chen & Mingxiao Li & Bing Zha, 2023. "Reconstruction of the cell pseudo-space from single-cell RNA sequencing data with scSpace," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    16. Lindsay A. Rutter & Henry Cope & Matthew J. MacKay & Raúl Herranz & Saswati Das & Sergey A. Ponomarev & Sylvain V. Costes & Amber M. Paul & Richard Barker & Deanne M. Taylor & Daniela Bezdan & Nathani, 2024. "Astronaut omics and the impact of space on the human body at scale," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    17. Ronja Mothes & Anna Pascual-Reguant & Ralf Koehler & Juliane Liebeskind & Alina Liebheit & Sandy Bauherr & Lars Philipsen & Carsten Dittmayer & Michael Laue & Regina Manitius & Sefer Elezkurtaj & Pawe, 2023. "Distinct tissue niches direct lung immunopathology via CCL18 and CCL21 in severe COVID-19," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    18. Joyce B. Kang & Aparna Nathan & Kathryn Weinand & Fan Zhang & Nghia Millard & Laurie Rumker & D. Branch Moody & Ilya Korsunsky & Soumya Raychaudhuri, 2021. "Efficient and precise single-cell reference atlas mapping with Symphony," Nature Communications, Nature, vol. 12(1), pages 1-21, December.
    19. Erping Long & Montserrat García-Closas & Stephen J. Chanock & M. Constanza Camargo & Nicholas E. Banovich & Jiyeon Choi, 2022. "The case for increasing diversity in tissue-based functional genomics datasets to understand human disease susceptibility," Nature Communications, Nature, vol. 13(1), pages 1-4, December.
    20. Yasa Baig & Helena R. Ma & Helen Xu & Lingchong You, 2023. "Autoencoder neural networks enable low dimensional structure analyses of microbial growth dynamics," Nature Communications, Nature, vol. 14(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39923-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.