IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v13y2021i3d10.1007_s12561-021-09304-8.html
   My bibliography  Save this article

Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing

Author

Listed:
  • Xiaotian Wu

    (Brown University)

  • Hao Wu

    (Emory University)

  • Zhijin Wu

    (Brown University)

Abstract

Single-cell RNA sequencing (scRNA-seq) quantifies RNA transcripts at individual cell level, providing cellular-level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species’ genome, which are of very high dimension and contain excessive zero counts. In order to better reduce the data dimension and extract robust and interpretable biological information, we develop a penalized Latent Dirichlet Allocation (pLDA) model for scRNA-seq data. The method is adapted from the generative probabilistic model LDA originated in natural language processing. pLDA models the scRNA-seq data by considering genes as words, cells as documents, and latent biological functions as topics. It imposes a penalty to reflect the characteristics in scRNA-seq that only a small subset of genes are expected to be topic-specific, which increases the robustness of the estimation and interpretability of the results. We apply pLDA to scRNA-seq datasets from both Drop-seq and SMARTer v1 technologies, and demonstrate improved performances in cell-type classification. The topics identified by pLDA are interpretable with biological functions.

Suggested Citation

  • Xiaotian Wu & Hao Wu & Zhijin Wu, 2021. "Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 543-562, December.
  • Handle: RePEc:spr:stabio:v:13:y:2021:i:3:d:10.1007_s12561-021-09304-8
    DOI: 10.1007/s12561-021-09304-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-021-09304-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-021-09304-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. D. T. Severson & R. P. Owen & M. J. White & X. Lu & B. Schuster-Böckler, 2018. "BEARscc determines robustness of single-cell clusters using simulated technical replicates," Nature Communications, Nature, vol. 9(1), pages 1-7, December.
    2. Davide Risso & Fanny Perraudeau & Svetlana Gribkova & Sandrine Dudoit & Jean-Philippe Vert, 2018. "A general and flexible method for signal extraction from single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-17, December.
    3. Kushal K Dey & Chiaowen Joyce Hsiao & Matthew Stephens, 2017. "Visualizing the structure of RNA-seq expression data using grade of membership models," PLOS Genetics, Public Library of Science, vol. 13(3), pages 1-23, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lucia Taraborrelli & Yasin Şenbabaoğlu & Lifen Wang & Junghyun Lim & Kerrigan Blake & Noelyn Kljavin & Sarah Gierke & Alexis Scherl & James Ziai & Erin McNamara & Mark Owyong & Shilpa Rao & Aslihan Ka, 2023. "Tumor-intrinsic expression of the autophagy gene Atg16l1 suppresses anti-tumor immunity in colorectal cancer," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Seymour Picciotto & Nicholas DeVita & Chiaowen Joyce Hsiao & Christopher Honan & Sze-Wah Tse & Mychael Nguyen & Joseph D. Ferrari & Wei Zheng & Brian T. Wipke & Eric Huang, 2022. "Selective activation and expansion of regulatory T cells using lipid encapsulated mRNA encoding a long-acting IL-2 mutein," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Malarvizhi Gurusamy & Denise Tischner & Jingchen Shao & Stephan Klatt & Sven Zukunft & Remy Bonnavion & Stefan Günther & Kai Siebenbrodt & Roxane-Isabelle Kestner & Tanja Kuhlmann & Ingrid Fleming & S, 2021. "G-protein-coupled receptor P2Y10 facilitates chemokine-induced CD4 T cell migration through autocrine/paracrine mediators," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    4. Yue Cao & Pengyi Yang & Jean Yee Hwa Yang, 2021. "A benchmark study of simulation methods for single-cell RNA sequencing data," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    5. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    6. Qi Liu & Charles A Herring & Quanhu Sheng & Jie Ping & Alan J Simmons & Bob Chen & Amrita Banerjee & Wei Li & Guoqiang Gu & Robert J Coffey & Yu Shyr & Ken S Lau, 2018. "Quantitative assessment of cell population diversity in single-cell landscapes," PLOS Biology, Public Library of Science, vol. 16(10), pages 1-29, October.
    7. Michael Greenacre & Patrick J. F Groenen & Trevor Hastie & Alfonso Iodice d’Enza & Angelos Markos & Elena Tuzhilina, 2023. "Principal component analysis," Economics Working Papers 1856, Department of Economics and Business, Universitat Pompeu Fabra.
    8. Chachrit Khunsriraksakul & Daniel McGuire & Renan Sauteraud & Fang Chen & Lina Yang & Lida Wang & Jordan Hughey & Scott Eckert & J. Dylan Weissenkampen & Ganesh Shenoy & Olivia Marx & Laura Carrel & B, 2022. "Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    9. Brendan F. Miller & Feiyang Huang & Lyla Atta & Arpan Sahoo & Jean Fan, 2022. "Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:13:y:2021:i:3:d:10.1007_s12561-021-09304-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.