Genome-wide association studies with high-dimensional phenotypes

My bibliography Save this article

Genome-wide association studies with high-dimensional phenotypes

Author

Listed:

Marttinen Pekka
(Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland)
Gillberg Jussi
(Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland)
Havulinna Aki
(National Institute for Health and Welfare, Department of Chronic Disease Prevention, P.O. Box 30, FI-00271 Helsinki, Finland)
Corander Jukka
(Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, P.O. Box 68, FI-00014 Helsinki, Finland)
Kaski Samuel
(Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 Helsinki, Finland)

Registered:

Abstract

High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

Suggested Citation

Marttinen Pekka & Gillberg Jussi & Havulinna Aki & Corander Jukka & Kaski Samuel, 2013. "Genome-wide association studies with high-dimensional phenotypes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 413-431, August.

Handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:413-431:n:1
DOI: 10.1515/sagmb-2012-0032

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Peter Donnelly, 2008. "Progress and challenges in genome-wide association studies in humans," Nature, Nature, vol. 456(7223), pages 728-731, December.
Lê Cao Kim-Anh & Rossouw Debra & Robert-Granié Christèle & Besse Philippe, 2008. "A Sparse PLS for Variable Selection when Integrating Omics Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-32, November.
Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Daniele, Bertolozzi-Caredio & Barbara, Soriano & Isabel, Bardaji & Alberto, Garrido, 2022. "Analysis of perceived robustness, adaptability and transformability of Spanish extensive livestock farms under alternative challenging scenarios," Agricultural Systems, Elsevier, vol. 202(C).
Yang Wu & Huizhong Fan & Yanhui Wang & Lupei Zhang & Xue Gao & Yan Chen & Junya Li & HongYan Ren & Huijiang Gao, 2014. "Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-11, October.
Minji Lee & Zhihua Su, 2020. "A Review of Envelope Models," International Statistical Review, International Statistical Institute, vol. 88(3), pages 658-676, December.
Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
Charles-Elie Rabier & Philippe Barre & Torben Asp & Gilles Charmet & Brigitte Mangin, 2016. "On the Accuracy of Genomic Selection," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-23, June.
Cemal Erdem & Sean M. Gross & Laura M. Heiser & Marc R. Birtwistle, 2023. "MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
Joanna L Davies & Jean-Baptiste Cazier & Malcolm G Dunlop & Richard S Houlston & Ian P Tomlinson & Chris C Holmes, 2012. "A Novel Test for Gene-Ancestry Interactions in Genome-Wide Association Data," PLOS ONE, Public Library of Science, vol. 7(12), pages 1-9, December.
Feuerriegel, Stefan & Gordon, Julius, 2019. "News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions," European Journal of Operational Research, Elsevier, vol. 272(1), pages 162-175.
Hernandez Roig, Harold Antonio & Aguilera Morillo, María del Carmen & Aguilera, Ana M. & Preda, Cristian, 2023. "Penalized function-on-function partial leastsquares regression," DES - Working Papers. Statistics and Econometrics. WS 37758, Universidad Carlos III de Madrid. Departamento de EstadÃstica.
Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
Michael Gutkin & Ron Shamir & Gideon Dror, 2009. "SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-12, July.
Leonardo Bottolo & Marco Banterle & Sylvia Richardson & Mika Ala‐Korpela & Marjo‐Riitta Järvelin & Alex Lewin, 2021. "A computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional quantitative trait loci discovery," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 886-908, August.
Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
Paul Bergeron & Morgane Dos Santos & Lisa Sitterle & Georges Tarlet & Jeremy Lavigne & Winchygn Liu & Marine Gerbé de Thoré & Céline Clémenson & Lydia Meziani & Cathyanne Schott & Giulia Mazzaschi & K, 2024. "Non-homogenous intratumor ionizing radiation doses synergize with PD1 and CXCR2 blockade," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
Xavier Bry & Ndèye Niang & Thomas Verron & Stéphanie Bougeard, 2023. "Clusterwise elastic-net regression based on a combined information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 75-107, March.
Marc Schoeler & Sandrine Ellero-Simatos & Till Birkner & Jordi Mayneris-Perxachs & Lisa Olsson & Harald Brolin & Ulrike Loeber & Jamie D. Kraft & Arnaud Polizzi & Marian Martí-Navas & Josep Puig & Ant, 2023. "The interplay between dietary fatty acids and gut microbiota influences host metabolism and hepatic steatosis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
Jain Yashita & Ding Shanshan & Qiu Jing, 2019. "Sliced inverse regression for integrative multi-omics data analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.
Chung Dongjun & Keles Sunduz, 2010. "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-32, March.
David Shotton & Katie Portwin & Graham Klyne & Alistair Miles, 2009. "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article," PLOS Computational Biology, Public Library of Science, vol. 5(4), pages 1-17, April.

More about this item

Keywords

metabolomics; genome-wide association study; canonical correlation analysis; regularized regression; latent confounding factors;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:413-431:n:1. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Genome-wide association studies with high-dimensional phenotypes

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data