IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005875.html
   My bibliography  Save this article

Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks

Author

Listed:
  • Ye Henry Li
  • Dangna Li
  • Nikolay Samusik
  • Xiaowei Wang
  • Leying Guan
  • Garry P Nolan
  • Wing Hung Wong

Abstract

Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject.Author summary: Recently, the cytometry field has experienced rapid advancement in the development of mass cytometry (CyTOF). CyTOF enables a significant increase in the ability to monitor 50 or more cellular markers for millions of cells at the single-cell level. Initial studies with CyTOF focused on few samples, in which expert manual discovery of cell types were acceptable. As the technology matures, it is now feasible to collect more samples, which enables systematic studies of cell types across multiple samples. However, the statistical and computational issues surrounding multi-sample analysis have not been previously examined in detail. Furthermore, it was not clear how the data analysis could be scaled for hundreds of samples, such as those in clinical studies. In this work, we present a scalable analysis pipeline that is grounded in strong statistical foundation. Partition-Assisted Clustering (PAC) offers fast and accurate clustering and Multiple Alignments of Networks (MAN) utilizes network structures learned from each homogeneous cluster to organize the data into data-set level clusters. PAC-MAN thus enables the analysis of a large CyTOF dataset that was previously too large to be analyzed systematically; this pipeline can be extended to the analysis of similarly large or larger datasets.

Suggested Citation

  • Ye Henry Li & Dangna Li & Nikolay Samusik & Xiaowei Wang & Leying Guan & Garry P Nolan & Wing Hung Wong, 2017. "Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-37, December.
  • Handle: RePEc:plo:pcbi00:1005875
    DOI: 10.1371/journal.pcbi.1005875
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005875
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005875&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005875?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Luo Lu & Hui Jiang & Wing H. Wong, 2013. "Multivariate Density Estimation by Bayesian Sequential Partitioning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1402-1410, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mammen, Enno & Martínez Miranda, María Dolores & Nielsen, Jens Perch, 2015. "In-sample forecasting applied to reserving and mesothelioma mortality," Insurance: Mathematics and Economics, Elsevier, vol. 61(C), pages 76-86.
    2. Kirschenmann, T.H. & Damien, P. & Walker, S.G., 2015. "A note on the e–a histogram," Statistics & Probability Letters, Elsevier, vol. 103(C), pages 105-109.
    3. Siong Thye Goh & Lesia Semenova & Cynthia Rudin, 2024. "Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms," INFORMS Joural on Data Science, INFORMS, vol. 3(1), pages 28-48, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005875. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.