IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004791.html
   My bibliography  Save this article

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering

Author

Listed:
  • Chuan Gao
  • Ian C McDowell
  • Shiwen Zhao
  • Christopher D Brown
  • Barbara E Engelhardt

Abstract

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.Author Summary: Recovering gene co-expression networks from high-throughput experiments to measure gene expression levels is essential for understanding the genetic regulation of complex traits. It is often assumed for simplicity that gene co-expression networks are static across different contexts—e.g., drug exposure, genotype, tissue, age, and sex. The biological reality is that, along with differences in gene expression levels, there are differences in gene interactions across contexts. In this work, we describe a model for Bayesian biclustering, or recovering non-disjoint clusters of co-expressed genes in subsets of samples using gene expression level data. Using results from our biclustering model, we build gene co-expression networks jointly across all genes by computing the full regularized covariance matrix between all pairs of genes instead of testing each possible edge separately. Because biclustering recovers structure in subsets of the samples, we are able to recover gene co-expression networks that occur across all samples, that are differential across contexts (e.g., up-regulated in males and down-regulated in females), and that are unique to a context (e.g., only co-expressed in lung tissue). We illustrate the robustness of our approach and biologically validate the networks recovered from three different gene expression data sets.

Suggested Citation

  • Chuan Gao & Ian C McDowell & Shiwen Zhao & Christopher D Brown & Barbara E Engelhardt, 2016. "Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-39, July.
  • Handle: RePEc:plo:pcbi00:1004791
    DOI: 10.1371/journal.pcbi.1004791
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004791
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004791&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004791?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
    2. Carlos M. Carvalho & Nicholas G. Polson & James G. Scott, 2010. "The horseshoe estimator for sparse signals," Biometrika, Biometrika Trust, vol. 97(2), pages 465-480.
    3. Joseph K. Pickrell & John C. Marioni & Athma A. Pai & Jacob F. Degner & Barbara E. Engelhardt & Everlyne Nkadori & Jean-Baptiste Veyrieras & Matthew Stephens & Yoav Gilad & Jonathan K. Pritchard, 2010. "Understanding mechanisms underlying human gene expression variation with RNA sequencing," Nature, Nature, vol. 464(7289), pages 768-772, April.
    4. Emma Pierson & the GTEx Consortium & Daphne Koller & Alexis Battle & Sara Mostafavi, 2015. "Sharing and Specificity of Co-expression Networks across 35 Human Tissues," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-19, May.
    5. Abed AlFatah Mansour & Ohad Gafni & Leehee Weinberger & Asaf Zviran & Muneef Ayyash & Yoach Rais & Vladislav Krupalnik & Mirie Zerbib & Daniela Amann-Zalcenstein & Itay Maza & Shay Geula & Sergey Viuk, 2012. "The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming," Nature, Nature, vol. 488(7411), pages 409-413, August.
    6. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    7. Barbara E Engelhardt & Matthew Stephens, 2010. "Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis," PLOS Genetics, Public Library of Science, vol. 6(9), pages 1-12, September.
    8. Turner, Heather & Bailey, Trevor & Krzanowski, Wojtek, 2005. "Improved biclustering of microarray data demonstrated through systematic performance tests," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 235-254, February.
    9. Schäfer Juliane & Strimmer Korbinian, 2005. "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    10. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    11. Oliver Stegle & Leopold Parts & Richard Durbin & John Winn, 2010. "A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-11, May.
    12. Carvalho, Carlos M. & Chang, Jeffrey & Lucas, Joseph E. & Nevins, Joseph R. & Wang, Quanli & West, Mike, 2008. "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1438-1456.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aaditya V Rangan & Caroline C McGrouther & John Kelsoe & Nicholas Schork & Eli Stahl & Qian Zhu & Arjun Krishnan & Vicky Yao & Olga Troyanskaya & Seda Bilaloglu & Preeti Raghavan & Sarah Bergen & Ande, 2018. "A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-29, May.
    2. Kaijie Xu & Yixi Wang, 2024. "A Novel Fuzzy Bi-Clustering Algorithm with Axiomatic Fuzzy Set for Identification of Co-Regulated Genes," Mathematics, MDPI, vol. 12(11), pages 1-11, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blum Yuna & Houée-Bigot Magalie & Causeur David, 2016. "Sparse factor model for co-expression networks with an application using prior biological knowledge," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(3), pages 253-272, June.
    2. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    3. Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.
    4. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    5. Jing Zhou & Anirban Bhattacharya & Amy H. Herring & David B. Dunson, 2015. "Bayesian Factorizations of Big Sparse Tensors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1562-1576, December.
    6. Sylvia Fruhwirth-Schnatter, 2023. "Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis," Papers 2303.00473, arXiv.org.
    7. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    8. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.
    9. Hauber, Philipp, 2022. "Real-time nowcasting with sparse factor models," EconStor Preprints 251551, ZBW - Leibniz Information Centre for Economics.
    10. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.
    11. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    12. Sahra Uygun & Cheng Peng & Melissa D Lehti-Shiu & Robert L Last & Shin-Han Shiu, 2016. "Utility and Limitations of Using Gene Expression Data to Identify Functional Associations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-27, December.
    13. Seungchul Baek & Yen‐Yi Ho & Yanyuan Ma, 2020. "Using sufficient direction factor model to analyze latent activities associated with breast cancer survival," Biometrics, The International Biometric Society, vol. 76(4), pages 1340-1350, December.
    14. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    15. Niko Hauzenberger & Florian Huber & Karin Klieber & Massimiliano Marcellino, 2022. "Bayesian Neural Networks for Macroeconomic Analysis," Papers 2211.04752, arXiv.org, revised Apr 2024.
    16. Jaejoon Lee & Seongil Jo & Jaeyong Lee, 2022. "Robust sparse Bayesian infinite factor models," Computational Statistics, Springer, vol. 37(5), pages 2693-2715, November.
    17. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    18. Kaufmann, Sylvia & Schumacher, Christian, 2019. "Bayesian estimation of sparse dynamic factor models with order-independent and ex-post mode identification," Journal of Econometrics, Elsevier, vol. 210(1), pages 116-134.
    19. Sung, Bongjung & Lee, Jaeyong, 2023. "Covariance structure estimation with Laplace approximation," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    20. Faisal Shahla & Tutz Gerhard, 2017. "Missing value imputation for gene expression data by tailored nearest neighbors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 95-106, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004791. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.