IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v56y2012i5p1038-1051.html
   My bibliography  Save this article

Modified versions of Bayesian Information Criterion for genome-wide association studies

Author

Listed:
  • Frommlet, Florian
  • Ruhaltinger, Felix
  • Twaróg, Piotr
  • Bogdan, Małgorzata

Abstract

For the vast majority of genome-wide association studies (GWAS) statistical analysis was performed by testing markers individually. Elementary statistical considerations clearly show that in the case of complex traits an approach based on multiple regression or generalized linear models is preferable to testing single markers. A model selection approach to GWAS can be based on modifications of the Bayesian Information Criterion (BIC), where some search strategies are necessary to deal with a huge number of potential models. Comprehensive simulations based on real SNP data confirm that model selection has larger power to detect causal SNPs in complex models than single-marker tests. Furthermore, testing single markers leads to substantial problems with proper ranking of causal SNPs and tends to detect a certain number of false positive SNPs, which are not linked to any of the causal mutations. This behavior of single-marker tests is typical in GWAS for complex traits and can be explained by an aggregated influence of many small random sample correlations between genotypes of the SNP under investigation and other causal SNPs. These findings might at least partially explain problems with low power and nonreplicability of results in GWAS. A real data analysis illustrates advantages of model selection in practice, where publicly available gene expression data as traits for individuals from the HapMap project are reanalyzed.

Suggested Citation

  • Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
  • Handle: RePEc:eee:csdana:v:56:y:2012:i:5:p:1038-1051
    DOI: 10.1016/j.csda.2011.05.005
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731100171X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2011.05.005?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Erhardt Vinzenz & Bogdan Malgorzata & Czado Claudia, 2010. "Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-27, June.
    2. Małgorzata Bogdan & Florian Frommlet & Przemysław Biecek & Riyan Cheng & Jayanta K. Ghosh & R.W. Doerge, 2008. "Extending the Modified Bayesian Information Criterion (mBIC) to Dense Markers and Multiple Interval Mapping," Biometrics, The International Biometric Society, vol. 64(4), pages 1162-1169, December.
    3. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    4. Clive J Hoggart & John C Whittaker & Maria De Iorio & David J Balding, 2008. "Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-8, July.
    5. Baierl, Andreas & Futschik, Andreas & Bogdan, Malgorzata & Biecek, Przemyslaw, 2007. "Locating multiple interacting quantitative trait loci using robust model selection," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6423-6434, August.
    6. Florian Frommlet, 2010. "Tag SNP selection based on clustering according to dominant sets found using replicator dynamics," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 65-83, April.
    7. Karl W. Broman & Terence P. Speed, 2002. "A model selection approach for the identification of quantitative trait loci in experimental crosses," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 641-656, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mielniczuk, Jan & Teisseyre, Paweł, 2014. "Using random subspace method for prediction and variable importance assessment in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 725-742.
    2. Frommlet Florian & Ljubic Ivana & Arnardóttir Helga Björk & Bogdan Malgorzata, 2012. "QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-26, May.
    3. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    4. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zak-Szatkowska, Malgorzata & Bogdan, Malgorzata, 2011. "Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2908-2924, November.
    2. Frommlet Florian & Ljubic Ivana & Arnardóttir Helga Björk & Bogdan Malgorzata, 2012. "QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-26, May.
    3. Erhardt Vinzenz & Bogdan Malgorzata & Czado Claudia, 2010. "Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-27, June.
    4. Ryan A. Peterson & Joseph E. Cavanaugh, 2022. "Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(3), pages 427-454, September.
    5. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    6. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    7. Chun Wang, 2021. "Using Penalized EM Algorithm to Infer Learning Trajectories in Latent Transition CDM," Psychometrika, Springer;The Psychometric Society, vol. 86(1), pages 167-189, March.
    8. repec:jss:jstsof:28:i02 is not listed on IDEAS
    9. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    10. Shan Luo & Jinfeng Xu & Zehua Chen, 2015. "Extended Bayesian information criterion in the Cox model with a high-dimensional feature space," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(2), pages 287-311, April.
    11. McLeod, A. Ian & Zhang, Ying, 2008. "Improved Subset Autoregression: With R Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i02).
    12. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    13. Xiaotong Shen & Wei Pan & Yunzhang Zhu & Hui Zhou, 2013. "On constrained and regularized high-dimensional regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(5), pages 807-832, October.
    14. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.
    15. Szefer Elena & Lu Donghuan & Nathoo Farouk & Beg Mirza Faisal & Graham Jinko, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 367-386, December.
    16. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    17. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    18. Lu Tang & Ling Zhou & Peter X. K. Song, 2019. "Fusion learning algorithm to combine partially heterogeneous Cox models," Computational Statistics, Springer, vol. 34(1), pages 395-414, March.
    19. Lian, Heng & Du, Pang & Li, YuanZhang & Liang, Hua, 2014. "Partially linear structure identification in generalized additive models with NP-dimensionality," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 197-208.
    20. Molly C. Klanderman & Kathryn B. Newhart & Tzahi Y. Cath & Amanda S. Hering, 2020. "Fault isolation for a complex decentralized waste water treatment facility," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 931-951, August.
    21. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:56:y:2012:i:5:p:1038-1051. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.