IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008973.html
   My bibliography  Save this article

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies

Author

Listed:
  • Helian Feng
  • Nicholas Mancuso
  • Alexander Gusev
  • Arunabha Majumdar
  • Megan Major
  • Bogdan Pasaniuc
  • Peter Kraft

Abstract

Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.Author summary: Transcriptome-wide association studies (TWAS) can improve the statistical power of genetic association studies by leveraging the relationship between genetically predicted transcript expression levels and an outcome. We propose a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. We generate cross-tissue expression features using sparse canonical correlation analysis and then combine evidence for expression-outcome association across cross- and single-tissue features using the aggregate Cauchy association test. We show that this approach has substantially higher power than traditional single-tissue TWAS methods. Application of these methods to publicly available summary statistics for ten complex traits also identifies associations missed by single-tissue methods.

Suggested Citation

  • Helian Feng & Nicholas Mancuso & Alexander Gusev & Arunabha Majumdar & Megan Major & Bogdan Pasaniuc & Peter Kraft, 2021. "Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies," PLOS Genetics, Public Library of Science, vol. 17(4), pages 1-21, April.
  • Handle: RePEc:plo:pgen00:1008973
    DOI: 10.1371/journal.pgen.1008973
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008973
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008973&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008973?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Ryan Sun & Shirley Hui & Gary D Bader & Xihong Lin & Peter Kraft, 2019. "Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic," PLOS Genetics, Public Library of Science, vol. 15(3), pages 1-27, March.
    3. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    4. Alvaro N Barbeira & Milton Pividori & Jiamao Zheng & Heather E Wheeler & Dan L Nicolae & Hae Kyung Im, 2019. "Integrating predicted transcriptome from multiple tissues improves association detection," PLOS Genetics, Public Library of Science, vol. 15(1), pages 1-20, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gianluca Ursini & Pasquale Di Carlo & Sreya Mukherjee & Qiang Chen & Shizhong Han & Jiyoung Kim & Maya Deyssenroth & Carmen J. Marsit & Jia Chen & Ke Hao & Giovanna Punzi & Daniel R. Weinberger, 2023. "Prioritization of potential causative genes for schizophrenia in placenta," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Chachrit Khunsriraksakul & Daniel McGuire & Renan Sauteraud & Fang Chen & Lina Yang & Lida Wang & Jordan Hughey & Scott Eckert & J. Dylan Weissenkampen & Ganesh Shenoy & Olivia Marx & Laura Carrel & B, 2022. "Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    3. Qile Dai & Geyu Zhou & Hongyu Zhao & Urmo Võsa & Lude Franke & Alexis Battle & Alexander Teumer & Terho Lehtimäki & Olli T. Raitakari & Tõnu Esko & Michael P. Epstein & Jingjing Yang, 2023. "OTTERS: a powerful TWAS framework leveraging summary-level reference data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    4. Xiaoyu Song & Jiayi Ji & Joseph H. Rothstein & Stacey E. Alexeeff & Lori C. Sakoda & Adriana Sistig & Ninah Achacoso & Eric Jorgenson & Alice S. Whittemore & Robert J. Klein & Laurel A. Habel & Pei Wa, 2023. "MiXcan: a framework for cell-type-aware transcriptome-wide association studies with an application to breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    5. Lucas A. Mavromatis & Daniel B. Rosoff & Andrew S. Bell & Jeesun Jung & Josephin Wagner & Falk W. Lohoff, 2023. "Multi-omic underpinnings of epigenetic aging and human longevity," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    6. Diptavo Dutta & Yuan He & Ashis Saha & Marios Arvanitis & Alexis Battle & Nilanjan Chatterjee, 2022. "Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood," Nature Communications, Nature, vol. 13(1), pages 1-14, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    2. Ida Kubiszewski & Kenneth Mulder & Diane Jarvis & Robert Costanza, 2022. "Toward better measurement of sustainable development and wellbeing: A small number of SDG indicators reliably predict life satisfaction," Sustainable Development, John Wiley & Sons, Ltd., vol. 30(1), pages 139-148, February.
    3. Christopher Kath & Florian Ziel, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Papers 1811.08604, arXiv.org.
    4. Kath, Christopher & Ziel, Florian, 2021. "Conformal prediction interval estimation and applications to day-ahead and intraday power markets," International Journal of Forecasting, Elsevier, vol. 37(2), pages 777-799.
    5. Dindaroglu, Burak & Ertac, Seda, 2024. "An empirical study of sequential offer bargaining during the Festival of Sacrifice," Journal of Economic Psychology, Elsevier, vol. 101(C).
    6. Kath, Christopher & Ziel, Florian, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Energy Economics, Elsevier, vol. 76(C), pages 411-423.
    7. Andreas Groll & Gerhard Tutz, 2017. "Variable selection in discrete survival models including heterogeneity," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(2), pages 305-338, April.
    8. Brunori, Paolo & Salas-Rojo, Pedro & Verme, Paolo, 2022. "Estimating Inequality with Missing Incomes," GLO Discussion Paper Series 1138, Global Labor Organization (GLO).
    9. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    10. Danhyang Lee & Jae Kwang Kim, 2022. "Semiparametric imputation using conditional Gaussian mixture models under item nonresponse," Biometrics, The International Biometric Society, vol. 78(1), pages 227-237, March.
    11. Winn-Nuñez, Emily T. & Griffin, Maryclare & Crawford, Lorin, 2024. "A simple approach for local and global variable importance in nonlinear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    12. Jacqueline K Kueper & Daniel J Lizotte & Manuel Montero-Odasso & Mark Speechley & for the Alzheimer’s Disease Neuroimaging Initiative, 2020. "Cognition and motor function: The gait and cognition pooled index," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-16, September.
    13. Andree,Bo Pieter Johannes, 2021. "Estimating Food Price Inflation from Partial Surveys," Policy Research Working Paper Series 9886, The World Bank.
    14. Ting‐Huei Chen & Hanaa Boughal, 2021. "A penalized structural equation modeling method accounting for secondary phenotypes for variable selection on genetically regulated expression from PrediXcan for Alzheimer's disease," Biometrics, The International Biometric Society, vol. 77(1), pages 362-371, March.
    15. Halewijn M. Drent & Barbara van den Hoofdakker & Jan K. Buitelaar & Pieter J. Hoekstra & Andrea Dietrich, 2022. "Factors Related to Perceived Stigma in Parents of Children and Adolescents in Outpatient Mental Healthcare," IJERPH, MDPI, vol. 19(19), pages 1-14, October.
    16. Monica E. Ellwood-Lowe & Susan Whitfield-Gabrieli & Silvia A. Bunge, 2021. "Brain network coupling associated with cognitive performance varies as a function of a child’s environment in the ABCD study," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    17. Miriam Levi & Giulia Cereda & Francesco Cipriani & Fabio Voller & Michela Baccini, 2023. "Case-Control Study on the Routes of Transmission of SARS-CoV-2 after the Third Pandemic Wave in Tuscany, Central Italy," IJERPH, MDPI, vol. 20(3), pages 1-13, January.
    18. Elizabeth Weigensberg & Derekh Cornwell & Lindsey Leininger & Matthew Stagner & Sarah LeBarron & Jonathan Gellar & Sophie MacIntyre & Richard Chapman & Erin J. Maher & Peter J. Pecora & Kirk O’Brien, "undated". "Superutilization of Child Welfare, Medicaid, and Other Services," Mathematica Policy Research Reports caaff77fa722452aa241ace4b, Mathematica Policy Research.
    19. Phung Khanh Lam & Tran Van Ngoc & Truong Thi Thu Thuy & Nguyen Thi Hong Van & Tran Thi Nhu Thuy & Dong Thi Hoai Tam & Nguyen Minh Dung & Nguyen Thi Hanh Tien & Nguyen Tan Thanh Kieu & Cameron Simmons , 2017. "The value of daily platelet counts for predicting dengue shock syndrome: Results from a prospective observational study of 2301 Vietnamese children with dengue," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 11(4), pages 1-20, April.
    20. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008973. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.