IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v12y2013i4p413-431n1.html
   My bibliography  Save this article

Genome-wide association studies with high-dimensional phenotypes

Author

Listed:
  • Marttinen Pekka

    (Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland)

  • Gillberg Jussi

    (Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland)

  • Havulinna Aki

    (National Institute for Health and Welfare, Department of Chronic Disease Prevention, P.O. Box 30, FI-00271 Helsinki, Finland)

  • Corander Jukka

    (Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, P.O. Box 68, FI-00014 Helsinki, Finland)

  • Kaski Samuel

    (Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 Helsinki, Finland)

Abstract

High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

Suggested Citation

  • Marttinen Pekka & Gillberg Jussi & Havulinna Aki & Corander Jukka & Kaski Samuel, 2013. "Genome-wide association studies with high-dimensional phenotypes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 413-431, August.
  • Handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:413-431:n:1
    DOI: 10.1515/sagmb-2012-0032
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2012-0032
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2012-0032?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peter Donnelly, 2008. "Progress and challenges in genome-wide association studies in humans," Nature, Nature, vol. 456(7223), pages 728-731, December.
    2. Lê Cao Kim-Anh & Rossouw Debra & Robert-Granié Christèle & Besse Philippe, 2008. "A Sparse PLS for Variable Selection when Integrating Omics Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-32, November.
    3. Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniele, Bertolozzi-Caredio & Barbara, Soriano & Isabel, Bardaji & Alberto, Garrido, 2022. "Analysis of perceived robustness, adaptability and transformability of Spanish extensive livestock farms under alternative challenging scenarios," Agricultural Systems, Elsevier, vol. 202(C).
    2. Minji Lee & Zhihua Su, 2020. "A Review of Envelope Models," International Statistical Review, International Statistical Institute, vol. 88(3), pages 658-676, December.
    3. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    4. Charles-Elie Rabier & Philippe Barre & Torben Asp & Gilles Charmet & Brigitte Mangin, 2016. "On the Accuracy of Genomic Selection," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-23, June.
    5. Cemal Erdem & Sean M. Gross & Laura M. Heiser & Marc R. Birtwistle, 2023. "MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    6. Joanna L Davies & Jean-Baptiste Cazier & Malcolm G Dunlop & Richard S Houlston & Ian P Tomlinson & Chris C Holmes, 2012. "A Novel Test for Gene-Ancestry Interactions in Genome-Wide Association Data," PLOS ONE, Public Library of Science, vol. 7(12), pages 1-9, December.
    7. Feuerriegel, Stefan & Gordon, Julius, 2019. "News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions," European Journal of Operational Research, Elsevier, vol. 272(1), pages 162-175.
    8. Hernandez Roig, Harold Antonio & Aguilera Morillo, María del Carmen & Aguilera, Ana M. & Preda, Cristian, 2023. "Penalized function-on-function partial leastsquares regression," DES - Working Papers. Statistics and Econometrics. WS 37758, Universidad Carlos III de Madrid. Departamento de Estadística.
    9. Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
    10. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    11. Michael Gutkin & Ron Shamir & Gideon Dror, 2009. "SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-12, July.
    12. Leonardo Bottolo & Marco Banterle & Sylvia Richardson & Mika Ala‐Korpela & Marjo‐Riitta Järvelin & Alex Lewin, 2021. "A computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional quantitative trait loci discovery," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 886-908, August.
    13. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    14. Xavier Bry & Ndèye Niang & Thomas Verron & Stéphanie Bougeard, 2023. "Clusterwise elastic-net regression based on a combined information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 75-107, March.
    15. Marc Schoeler & Sandrine Ellero-Simatos & Till Birkner & Jordi Mayneris-Perxachs & Lisa Olsson & Harald Brolin & Ulrike Loeber & Jamie D. Kraft & Arnaud Polizzi & Marian Martí-Navas & Josep Puig & Ant, 2023. "The interplay between dietary fatty acids and gut microbiota influences host metabolism and hepatic steatosis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    16. Jain Yashita & Ding Shanshan & Qiu Jing, 2019. "Sliced inverse regression for integrative multi-omics data analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.
    17. Chung Dongjun & Keles Sunduz, 2010. "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-32, March.
    18. Huang, Lucy & Buzbas, Erkan O. & Rosenberg, Noah A., 2013. "Genotype imputation in a coalescent model with infinitely-many-sites mutation," Theoretical Population Biology, Elsevier, vol. 87(C), pages 62-74.
    19. Perrin, Augustine & Cristobal, Magali San & Milestad, Rebecka & Martin, Guillaume, 2020. "Identification of resilience factors of organic dairy cattle farms," Agricultural Systems, Elsevier, vol. 183(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:12:y:2013:i:4:p:413-431:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.