IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1000130.html
   My bibliography  Save this article

Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

Author

Listed:
  • Clive J Hoggart
  • John C Whittaker
  • Maria De Iorio
  • David J Balding

Abstract

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.Author Summary: Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.

Suggested Citation

  • Clive J Hoggart & John C Whittaker & Maria De Iorio & David J Balding, 2008. "Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-8, July.
  • Handle: RePEc:plo:pgen00:1000130
    DOI: 10.1371/journal.pgen.1000130
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000130
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1000130&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1000130?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. P. J. Brown & M. Vannucci & T. Fearn, 2002. "Bayes model averaging with selection of regressors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 519-536, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    2. Hai-Yan Lü & Xiao-Fen Liu & Shi-Ping Wei & Yuan-Ming Zhang, 2011. "Epistatic Association Mapping in Homozygous Crop Cultivars," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-10, March.
    3. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    4. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    5. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.
    6. Szefer Elena & Lu Donghuan & Nathoo Farouk & Beg Mirza Faisal & Graham Jinko, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 367-386, December.
    7. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    8. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    9. Laura N Anderson & Laurent Briollais & Helen C Atkinson & Julie A Marsh & Jingxiong Xu & Kristin L Connor & Stephen G Matthews & Craig E Pennell & Stephen J Lye, 2014. "Investigation of Genetic Variants, Birthweight and Hypothalamic-Pituitary-Adrenal Axis Function Suggests a Genetic Variant in the SERPINA6 Gene Is Associated with Corticosteroid Binding Globulin in th," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-8, April.
    10. Castro, Bruno M. & Lemes, Renan B. & Cesar, Jonatas & Hünemeier, Tábita & Leonardi, Florencia, 2018. "A model selection approach for multiple sequence segmentation and dimensionality reduction," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 319-330.
    11. Gerhard Moser & Sang Hong Lee & Ben J Hayes & Michael E Goddard & Naomi R Wray & Peter M Visscher, 2015. "Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model," PLOS Genetics, Public Library of Science, vol. 11(4), pages 1-22, April.
    12. Tomi Peltola & Pekka Marttinen & Aki Vehtari, 2012. "Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-11, November.
    13. Silver Matt & Montana Giovanni & Alzheimer's Disease Neuroimaging Initiative, 2012. "Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-43, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ander Wilson & Brian J. Reich, 2014. "Confounder selection via penalized credible regions," Biometrics, The International Biometric Society, vol. 70(4), pages 852-861, December.
    2. Naijun Sha & Marina Vannucci & Mahlet G. Tadesse & Philip J. Brown & Ilaria Dragoni & Nick Davies & Tracy C. Roberts & Andrea Contestabile & Mike Salmon & Chris Buckley & Francesco Falciani, 2004. "Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage," Biometrics, The International Biometric Society, vol. 60(3), pages 812-819, September.
    3. Theo S. Eicher & Chris Papageorgiou & Adrian E. Raftery, 2011. "Default priors and predictive performance in Bayesian model averaging, with application to growth determinants," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 26(1), pages 30-55, January/F.
    4. Dimitris Korobilis, 2008. "Forecasting in vector autoregressions with many predictors," Advances in Econometrics, in: Bayesian Econometrics, pages 403-431, Emerald Group Publishing Limited.
    5. Annalisa Cadonna & Sylvia Frühwirth-Schnatter & Peter Knaus, 2020. "Triple the Gamma—A Unifying Shrinkage Prior for Variance and Variable Selection in Sparse State Space and TVP Models," Econometrics, MDPI, vol. 8(2), pages 1-36, May.
    6. Andrés Ramírez-Hassan, 2020. "Dynamic variable selection in dynamic logistic regression: an application to Internet subscription," Empirical Economics, Springer, vol. 59(2), pages 909-932, August.
    7. Haowen Bao & Zongwu Cai & Yuying Sun & Shouyang Wang, 2023. "Penalized Model Averaging for High Dimensional Quantile Regressions," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202302, University of Kansas, Department of Economics, revised Jan 2023.
    8. Naijun Sha & Benard Owusu Dechi, 2019. "A Bayes Inference for Ordinal Response with Latent Variable Approach," Stats, MDPI, vol. 2(2), pages 1-11, June.
    9. Yang, Yandong & Hong, Weijun & Li, Shufang, 2019. "Deep ensemble learning based probabilistic load forecasting in smart grids," Energy, Elsevier, vol. 189(C).
    10. Simila, Timo & Tikka, Jarkko, 2007. "Input selection and shrinkage in multiresponse linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 406-422, September.
    11. Davide fiaschi & Angela Parenti, 2013. "An Estimate of the Degree of Interconnectedness between European Regions: A Bayesian Model Averaging Approach," Discussion Papers 2013/171, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.
    12. Anastasia Dimiski, 2020. "Factors that affect Students’ performance in Science: An application using Gini-BMA methodology in PISA 2015 dataset," Working Papers 2004, University of Guelph, Department of Economics and Finance.
    13. Nott, David J. & Leng, Chenlei, 2010. "Bayesian projection approaches to variable selection in generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3227-3241, December.
    14. Ram C. Kafle & Netra Khanal & Chris P. Tsokos, 2014. "Bayesian age-stratified joinpoint regression model: an application to lung and brain cancer mortality," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(12), pages 2727-2742, December.
    15. D. Fouskakis & I. Ntzoufras & D. Draper, 2009. "Population‐based reversible jump Markov chain Monte Carlo methods for Bayesian variable selection and evaluation under cost limit restrictions," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 58(3), pages 383-403, July.
    16. ter Braak, Cajo J.F., 2006. "Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 1232-1242, November.
    17. Ouysse, Rachida & Kohn, Robert, 2010. "Bayesian variable selection and model averaging in the arbitrage pricing theory model," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3249-3268, December.
    18. Hongxiao Zhu & Marina Vannucci & Dennis D. Cox, 2010. "A Bayesian Hierarchical Model for Classification with Selection of Functional Predictors," Biometrics, The International Biometric Society, vol. 66(2), pages 463-473, June.
    19. Leonardo Bottolo & Marco Banterle & Sylvia Richardson & Mika Ala‐Korpela & Marjo‐Riitta Järvelin & Alex Lewin, 2021. "A computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional quantitative trait loci discovery," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 886-908, August.
    20. Kathrin Plankensteiner & Olivia Bluder & Jürgen Pilz, 2015. "Bayesian Network Model with Application to Smart Power Semiconductor Lifetime Data," Risk Analysis, John Wiley & Sons, vol. 35(9), pages 1623-1639, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1000130. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.