IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003130.html
   My bibliography  Save this article

Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples

Author

Listed:
  • Andrew Cron
  • Cécile Gouttefangeas
  • Jacob Frelinger
  • Lin Lin
  • Satwinder K Singh
  • Cedrik M Britten
  • Marij J P Welters
  • Sjoerd H van der Burg
  • Mike West
  • Cliburn Chan

Abstract

Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.Author Summary: The use of flow cytometry to count antigen-specific T cells is essential for vaccine development, monitoring of immune-based therapies and immune biomarker discovery. Analysis of such data is challenging because antigen-specific cells are often present in frequencies of less than 1 in 1,000 peripheral blood mononuclear cells (PBMC). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. Consequently, there is intense interest in automated approaches for cell subset identification. One popular class of such automated approaches is the use of statistical mixture models. We propose a hierarchical extension of statistical mixture models that has two advantages over standard mixture models. First, it increases the ability to detect extremely rare event clusters that are present in multiple samples. Second, it enables direct comparison of cell subsets by aligning clusters across multiple samples in a natural way arising from the hierarchical formulation. We demonstrate the algorithm on clinically relevant reference PBMC samples with known frequencies of CD8 T cells engineered to express T cell receptors specific for the cancer-testis antigen (NY-ESO-1) and compare its performance with other popular automated analysis approaches.

Suggested Citation

  • Andrew Cron & Cécile Gouttefangeas & Jacob Frelinger & Lin Lin & Satwinder K Singh & Cedrik M Britten & Marij J P Welters & Sjoerd H van der Burg & Mike West & Cliburn Chan, 2013. "Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples," PLOS Computational Biology, Public Library of Science, vol. 9(7), pages 1-14, July.
  • Handle: RePEc:plo:pcbi00:1003130
    DOI: 10.1371/journal.pcbi.1003130
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003130
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003130&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003130?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    2. Peter Müller & Fernando Quintana & Gary Rosner, 2004. "A method for combining inference across related nonparametric Bayesian models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(3), pages 735-749, August.
    3. Cron, Andrew J. & West, Mike, 2011. "Efficient Classification-Based Relabeling in Mixture Models," The American Statistician, American Statistical Association, vol. 65(1), pages 16-20.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gunther Glehr & Paloma Riquelme & Katharina Kronenberg & Robert Lohmayer & Víctor J. López-Madrona & Michael Kapinsky & Hans J. Schlitt & Edward K. Geissler & Rainer Spang & Sebastian Haferkamp & Jame, 2024. "Restricting datasets to classifiable samples augments discovery of immune disease biomarkers," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    2. Yuan Qi & Youhan Fang & David R Sinclair & Shangqin Guo & Meritxell Alberich-Jorda & Jun Lu & Daniel G Tenen & Michael G Kharas & Saumyadipta Pyne, 2020. "High-speed automatic characterization of rare events in flow cytometric data," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-18, February.
    3. Greg Finak & Jacob Frelinger & Wenxin Jiang & Evan W Newell & John Ramey & Mark M Davis & Spyros A Kalams & Stephen C De Rosa & Raphael Gottardo, 2014. "OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-12, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2014. "Beta-product dependent Pitman–Yor processes for Bayesian inference," Journal of Econometrics, Elsevier, vol. 180(1), pages 49-72.
    2. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2011. "Beta-product Poisson-Dirichlet Processes," DES - Working Papers. Statistics and Econometrics. WS 12160, Universidad Carlos III de Madrid. Departamento de Estadística.
    3. Rodrigues, G.S. & Nott, David J. & Sisson, S.A., 2016. "Functional regression approximate Bayesian computation for Gaussian process density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 229-241.
    4. Michelle Dietzen & Haoran Zhai & Olivia Lucas & Oriol Pich & Christopher Barrington & Wei-Ting Lu & Sophia Ward & Yanping Guo & Robert E. Hynds & Simone Zaccaria & Charles Swanton & Nicholas McGranaha, 2024. "Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    5. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    6. Jin, Xin & Maheu, John M., 2016. "Bayesian semiparametric modeling of realized covariance matrices," Journal of Econometrics, Elsevier, vol. 192(1), pages 19-39.
    7. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    8. Mahsa Samsami & Ralf Wagner, 2021. "Investment Decisions with Endogeneity: A Dirichlet Tree Analysis," JRFM, MDPI, vol. 14(7), pages 1-19, July.
    9. Jeffrey L. Furman & Florenta Teodoridis, 2020. "Automation, Research Technology, and Researchers’ Trajectories: Evidence from Computer Science and Electrical Engineering," Organization Science, INFORMS, vol. 31(2), pages 330-354, March.
    10. Xin Jin & John M. Maheu & Qiao Yang, 2019. "Bayesian parametric and semiparametric factor models for large realized covariance matrices," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(5), pages 641-660, August.
    11. Csereklyei, Zsuzsanna & Anantharama, Nandini & Kallies, Anne, 2021. "Electricity market transitions in Australia: Evidence using model-based clustering," Energy Economics, Elsevier, vol. 103(C).
    12. Shu-Ping Shi & Yong Song, 2012. "Identifying Speculative Bubbles with an Infinite Hidden Markov Model," Working Paper series 26_12, Rimini Centre for Economic Analysis.
    13. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    14. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    15. Jin, Xin & Maheu, John M. & Yang, Qiao, 2022. "Infinite Markov pooling of predictive distributions," Journal of Econometrics, Elsevier, vol. 228(2), pages 302-321.
    16. Thomas R. W. Oliver & Lia Chappell & Rashesh Sanghvi & Lauren Deighton & Naser Ansari-Pour & Stefan C. Dentro & Matthew D. Young & Tim H. H. Coorens & Hyunchul Jung & Tim Butler & Matthew D. C. Nevill, 2022. "Clonal diversification and histogenesis of malignant germ cell tumours," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    17. Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
    18. Michael L. Pennell & David B. Dunson, 2008. "Nonparametric Bayes Testing of Changes in a Response Distribution with an Ordinal Predictor," Biometrics, The International Biometric Society, vol. 64(2), pages 413-423, June.
    19. Billio, Monica & Casarin, Roberto & Rossini, Luca, 2019. "Bayesian nonparametric sparse VAR models," Journal of Econometrics, Elsevier, vol. 212(1), pages 97-115.
    20. Bruno Scarpa & David B. Dunson, 2009. "Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors," Biometrics, The International Biometric Society, vol. 65(3), pages 772-780, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003130. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.