Author
Listed:
- Yang Li
- Alexis A Jourdain
- Sarah E Calvo
- Jun S Liu
- Vamsi K Mootha
Abstract
In recent years, there has been a huge rise in the number of publicly available transcriptional profiling datasets. These massive compendia comprise billions of measurements and provide a special opportunity to predict the function of unstudied genes based on co-expression to well-studied pathways. Such analyses can be very challenging, however, since biological pathways are modular and may exhibit co-expression only in specific contexts. To overcome these challenges we introduce CLIC, CLustering by Inferred Co-expression. CLIC accepts as input a pathway consisting of two or more genes. It then uses a Bayesian partition model to simultaneously partition the input gene set into coherent co-expressed modules (CEMs), while assigning the posterior probability for each dataset in support of each CEM. CLIC then expands each CEM by scanning the transcriptome for additional co-expressed genes, quantified by an integrated log-likelihood ratio (LLR) score weighted for each dataset. As a byproduct, CLIC automatically learns the conditions (datasets) within which a CEM is operative. We implemented CLIC using a compendium of 1774 mouse microarray datasets (28628 microarrays) or 1887 human microarray datasets (45158 microarrays). CLIC analysis reveals that of 910 canonical biological pathways, 30% consist of strongly co-expressed gene modules for which new members are predicted. For example, CLIC predicts a functional connection between protein C7orf55 (FMC1) and the mitochondrial ATP synthase complex that we have experimentally validated. CLIC is freely available at www.gene-clic.org. We anticipate that CLIC will be valuable both for revealing new components of biological pathways as well as the conditions in which they are active.Author summary: A major challenge in modern genomics research is to link the thousands of unstudied genes to the pathways and complexes within which they operate. A popular strategy to infer the function of an unstudied gene is to search for co-expressing genes of known function using a single transcriptional profiling dataset. Today, there are literally thousands of transcriptional profiling datasets, and a special opportunity lies in querying entire compendia for co-expression in order to more reliably expand pathway membership. Such analyses can be challenging, however, as pathways can be highly modular, and different datasets can conflict in terms of providing evidence of co-expression. To overcome these challenges, we introduce a tool called CLIC, CLustering by Inferred Co-expression. CLIC accepts a pathway of interest, simultaneously partitioning it into modules of genes that exhibit striking co-expression patterns while also learning the number of modules. It then expands each module with new members, based on an integrated weighted co-expression score across the datasets. Three key innovations within CLIC–partitioning, background correction, and integration–distinguish it from other methods. A side benefit of CLIC is that it spotlights the datasets that support the co-expression of a given co-expression module. Our software is freely available, and should be useful for identifying new genes in biological pathways while also identifying the datasets within which the pathways are active.
Suggested Citation
Yang Li & Alexis A Jourdain & Sarah E Calvo & Jun S Liu & Vamsi K Mootha, 2017.
"CLIC, a tool for expanding biological pathways based on co-expression across thousands of datasets,"
PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-29, July.
Handle:
RePEc:plo:pcbi00:1005653
DOI: 10.1371/journal.pcbi.1005653
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005653. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.