IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002330.html
   My bibliography  Save this article

Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies

Author

Listed:
  • Nicoló Fusi
  • Oliver Stegle
  • Neil D Lawrence

Abstract

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/. Author Summary: The computational analysis of genetical genomics studies is challenged by confounding variation that is unrelated to the genetic factors of interest. Several approaches to account for these confounding factors have been proposed, greatly increasing the sensitivity in recovering direct genetic (cis) associations between variable genetic loci and the expression levels of individual genes. Crucially, these existing techniques largely rely on the true association signals being orthogonal to the confounding variation. Here, we show that when studying indirect (trans) genetic effects, for example from master regulators, their association signals can overlap with confounding factors estimated using existing methods. This technical overlap can lead to overcorrection, erroneously explaining away true associations as confounders. To address these shortcomings, we propose PANAMA, a model that jointly learns hidden factors while accounting for the effect of selected genetic regulators. In applications to several studies, PANAMA is more accurate than existing methods in recovering the hidden confounding factors. As a result, we find an increase in the statistical power for direct (cis) and indirect (trans) associations. Most strikingly on yeast, PANAMA not only finds additional associations but also identifies master regulators that can be better reproduced between independent studies.

Suggested Citation

  • Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.
  • Handle: RePEc:plo:pcbi00:1002330
    DOI: 10.1371/journal.pcbi.1002330
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002330
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002330&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002330?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Joseph K. Pickrell & John C. Marioni & Athma A. Pai & Jacob F. Degner & Barbara E. Engelhardt & Everlyne Nkadori & Jean-Baptiste Veyrieras & Matthew Stephens & Yoav Gilad & Jonathan K. Pritchard, 2010. "Understanding mechanisms underlying human gene expression variation with RNA sequencing," Nature, Nature, vol. 464(7289), pages 768-772, April.
    2. Oliver Stegle & Leopold Parts & Richard Durbin & John Winn, 2010. "A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-11, May.
    3. Alexandra C Nica & Leopold Parts & Daniel Glass & James Nisbet & Amy Barrett & Magdalena Sekowska & Mary Travers & Simon Potter & Elin Grundberg & Kerrin Small & Åsa K Hedman & Veronique Bataille & Jo, 2011. "The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study," PLOS Genetics, Public Library of Science, vol. 7(2), pages 1-9, February.
    4. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    2. Leonardo Bottolo & Marco Banterle & Sylvia Richardson & Mika Ala‐Korpela & Marjo‐Riitta Järvelin & Alex Lewin, 2021. "A computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional quantitative trait loci discovery," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 886-908, August.
    3. Marttinen Pekka & Gillberg Jussi & Havulinna Aki & Corander Jukka & Kaski Samuel, 2013. "Genome-wide association studies with high-dimensional phenotypes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 413-431, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chuan Gao & Ian C McDowell & Shiwen Zhao & Christopher D Brown & Barbara E Engelhardt, 2016. "Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-39, July.
    2. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    3. Kaido Lepik & Tarmo Annilo & Viktorija Kukuškina & eQTLGen Consortium & Kai Kisand & Zoltán Kutalik & Pärt Peterson & Hedi Peterson, 2017. "C-reactive protein upregulates the whole blood expression of CD59 - an integrative analysis," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-20, September.
    4. Zari Dastani & Marie-France Hivert & Nicholas Timpson & John R B Perry & Xin Yuan & Robert A Scott & Peter Henneman & Iris M Heid & Jorge R Kizer & Leo-Pekka Lyytikäinen & Christian Fuchsberger & Tosh, 2012. "Novel Loci for Adiponectin Levels and Their Influence on Type 2 Diabetes and Metabolic Traits: A Multi-Ethnic Meta-Analysis of 45,891 Individuals," PLOS Genetics, Public Library of Science, vol. 8(3), pages 1-23, March.
    5. repec:jss:jstsof:40:i14 is not listed on IDEAS
    6. Seong Kyu Han & Michelle T. McNulty & Christopher J. Benway & Pei Wen & Anya Greenberg & Ana C. Onuchic-Whitford & Dongkeun Jang & Jason Flannick & Noël P. Burtt & Parker C. Wilson & Benjamin D. Humph, 2023. "Mapping genomic regulation of kidney disease and traits through high-resolution and interpretable eQTLs," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    7. Emanuele Aliverti & Kristian Lum & James E. Johndrow & David B. Dunson, 2021. "Removing the influence of group variables in high‐dimensional predictive modelling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 791-811, July.
    8. Seungchul Baek & Yen‐Yi Ho & Yanyuan Ma, 2020. "Using sufficient direction factor model to analyze latent activities associated with breast cancer survival," Biometrics, The International Biometric Society, vol. 76(4), pages 1340-1350, December.
    9. Griffin, Maryclare & Hoff, Peter D., 2019. "Lasso ANOVA decompositions for matrix and tensor data," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 181-194.
    10. Pingting Ying & Can Chen & Zequn Lu & Shuoni Chen & Ming Zhang & Yimin Cai & Fuwei Zhang & Jinyu Huang & Linyun Fan & Caibo Ning & Yanmin Li & Wenzhuo Wang & Hui Geng & Yizhuo Liu & Wen Tian & Zhiyong, 2023. "Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    11. Satria P. Sajuthi & Jamie L. Everman & Nathan D. Jackson & Benjamin Saef & Cydney L. Rios & Camille M. Moore & Angel C. Y. Mak & Celeste Eng & Ana Fairbanks-Mahnke & Sandra Salazar & Jennifer Elhawary, 2022. "Nasal airway transcriptome-wide association study of asthma reveals genetically driven mucus pathobiology," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    12. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    13. Chee Ho H’ng & Shanika L. Amarasinghe & Boya Zhang & Hojin Chang & Xinli Qu & David R. Powell & Alberto Rosello-Diez, 2024. "Compensatory growth and recovery of cartilage cytoarchitecture after transient cell death in fetal mouse limbs," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    14. Mark Reimers, 2010. "Making Informed Choices about Microarray Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-7, May.
    15. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    16. Bin Wang, 2020. "A Zipf-plot based normalization method for high-throughput RNA-seq data," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-15, April.
    17. Yuto Hasegawa & Juhyun Kim & Gianluca Ursini & Yan Jouroukhin & Xiaolei Zhu & Yu Miyahara & Feiyi Xiong & Samskruthi Madireddy & Mizuho Obayashi & Beat Lutz & Akira Sawa & Solange P. Brown & Mikhail V, 2023. "Microglial cannabinoid receptor type 1 mediates social memory deficits in mice produced by adolescent THC exposure and 16p11.2 duplication," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    18. Sudhir Varma, 2020. "Blind estimation and correction of microarray batch effect," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-15, April.
    19. Friguet, Chloé & Causeur, David, 2011. "Estimation of the proportion of true null hypotheses in high-dimensional data under dependence," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2665-2676, September.
    20. Xuemeng Zhou & Tsz Wing Sam & Ah Young Lee & Danny Leung, 2021. "Mouse strain-specific polymorphic provirus functions as cis-regulatory element leading to epigenomic and transcriptomic variations," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    21. Michael W Nagle & Jeanne C Latourelle & Adam Labadorf & Alexandra Dumitriu & Tiffany C Hadzi & Thomas G Beach & Richard H Myers, 2016. "The 4p16.3 Parkinson Disease Risk Locus Is Associated with GAK Expression and Genes Involved with the Synaptic Vesicle Membrane," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-14, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002330. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.