IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0143196.html
   My bibliography  Save this article

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

Author

Listed:
  • Florian Wagner

Abstract

Method: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results: I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

Suggested Citation

  • Florian Wagner, 2015. "GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-26, November.
  • Handle: RePEc:plo:pone00:0143196
    DOI: 10.1371/journal.pone.0143196
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143196
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0143196&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0143196?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Inbal Vaknin & Or Willinger & Jonathan Mandl & Hadar Heuberger & Dan Ben-Ami & Yi Zeng & Sarah Goldberg & Yaron Orenstein & Roee Amit, 2024. "A universal system for boosting gene expression in eukaryotic cell-lines," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    2. Miaomiao Li & Tao Yao & Wanru Lin & Will E. Hinckley & Mary Galli & Wellington Muchero & Andrea Gallavotti & Jin-Gui Chen & Shao-shan Carol Huang, 2023. "Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    3. Yoon Keun Cho & Young Cheol Yoon & Hyeonyeong Im & Yeonho Son & Minsu Kim & Abhirup Saha & Cheoljun Choi & Jaewon Lee & Sumin Lee & Jae Hyun Kim & Yun Pyo Kang & Young-Suk Jung & Hong Koo Ha & Je Kyun, 2022. "Adipocyte lysoplasmalogenase TMEM86A regulates plasmalogen homeostasis and protein kinase A-dependent energy metabolism," Nature Communications, Nature, vol. 13(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0143196. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.