IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p1944-d1420475.html
   My bibliography  Save this article

An Improved Expectation–Maximization Bayesian Algorithm for GWAS

Author

Listed:
  • Ganwen Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jianini Zhao

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jieru Wang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Guo Lin

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Lin Li

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Fengfei Ban

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Meiting Zhu

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Yangjun Wen

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Jin Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

Abstract

Genome-wide association studies (GWASs) are flexible and comprehensive tools for identifying single nucleotide polymorphisms (SNPs) associated with complex traits or diseases. The whole-genome Bayesian models are an effective way of incorporating important prior information into modeling. Bayesian methods have been widely used in association analysis. However, Bayesian analysis is often not feasible due to the high-throughput genotype and large sample sizes involved. In this study, we propose a new Bayesian algorithm under the mixed linear model framework: the expectation and maximization BayesB Improved algorithm (emBBI). The emBBI algorithm corrects polygenic and environmental noise and reduces dimensions; then, it estimates and tests marker effects using emBayesB and the LOD test, respectively. We conducted two simulation experiments and analyzed a real dataset related to flowering time in Arabidopsis to demonstrate the validation of the new algorithm. The results show that the emBBI algorithm is more flexible and accurate in simulation studies compared to established methods, and it performs well under complex genetic backgrounds. The analysis of the Arabidopsis real dataset further illustrates the advantages of the emBBI algorithm for GWAS by detecting known genes. Furthermore, 12 candidate genes are identified in the neighborhood of the significant quantitative trait nucleotides (QTNs) of flowering-related QTNs in Arabidopsis . In addition, we also performed enrichment analysis and tissue expression analysis of candidate genes, which will help us better understand the genetic basis of flowering-related traits in Arabidopsis .

Suggested Citation

  • Ganwen Zhang & Jianini Zhao & Jieru Wang & Guo Lin & Lin Li & Fengfei Ban & Meiting Zhu & Yangjun Wen & Jin Zhang, 2024. "An Improved Expectation–Maximization Bayesian Algorithm for GWAS," Mathematics, MDPI, vol. 12(13), pages 1-14, June.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gerhard Moser & Sang Hong Lee & Ben J Hayes & Michael E Goddard & Naomi R Wray & Peter M Visscher, 2015. "Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model," PLOS Genetics, Public Library of Science, vol. 11(4), pages 1-22, April.
    2. Susanna Atwell & Yu S. Huang & Bjarni J. Vilhjálmsson & Glenda Willems & Matthew Horton & Yan Li & Dazhe Meng & Alexander Platt & Aaron M. Tarone & Tina T. Hu & Rong Jiang & N. Wayan Muliyati & Xu Zha, 2010. "Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines," Nature, Nature, vol. 465(7298), pages 627-631, June.
    3. Xiaolei Liu & Meng Huang & Bin Fan & Edward S Buckler & Zhiwu Zhang, 2016. "Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies," PLOS Genetics, Public Library of Science, vol. 12(2), pages 1-24, February.
    4. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cox Lwaka Tamba & Yuan-Li Ni & Yuan-Ming Zhang, 2017. "Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-20, January.
    2. Gianola, Daniel & Fernando, Rohan L. & Schön, Chris-Carolin, 2020. "Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression," Theoretical Population Biology, Elsevier, vol. 132(C), pages 47-59.
    3. Niloy Biswas & Anirban Bhattacharya & Pierre E. Jacob & James E. Johndrow, 2022. "Coupling‐based convergence assessment of some Gibbs samplers for high‐dimensional Bayesian regression with shrinkage priors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 973-996, July.
    4. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    5. Armagan, Artin & Dunson, David, 2011. "Sparse variational analysis of linear mixed models for large data sets," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1056-1062, August.
    6. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    7. Eliaz, Kfir & Spiegler, Ran, 2022. "On incentive-compatible estimators," Games and Economic Behavior, Elsevier, vol. 132(C), pages 204-220.
    8. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    9. Hauzenberger, Niko, 2021. "Flexible Mixture Priors for Large Time-varying Parameter Models," Econometrics and Statistics, Elsevier, vol. 20(C), pages 87-108.
    10. Korobilis, Dimitris, 2015. "Quantile forecasts of inflation under model uncertainty," MPRA Paper 64341, University Library of Munich, Germany.
    11. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    12. Damien Rousselière, 2019. "A Flexible Approach to Age Dependence in Organizational Mortality: Comparing the Life Duration for Cooperative and Non-Cooperative Enterprises Using a Bayesian Generalized Additive Discrete Time Survi," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 17(4), pages 829-855, December.
    13. Chakraborty, Sounak, 2012. "Bayesian multiple response kernel regression model for high dimensional data and its practical applications in near infrared spectroscopy," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2742-2755.
    14. Virginia X. He & Matt P. Wand, 2024. "Bayesian generalized additive model selection including a fast variational option," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(3), pages 639-668, September.
    15. Ji, Yonggang & Lin, Nan & Zhang, Baoxue, 2012. "Model selection in binary and tobit quantile regression using the Gibbs sampler," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 827-839.
    16. Hu, Guanyu, 2021. "Spatially varying sparsity in dynamic regression models," Econometrics and Statistics, Elsevier, vol. 17(C), pages 23-34.
    17. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    18. Jan Prüser, 2021. "Forecasting US inflation using Markov dimension switching," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(3), pages 481-499, April.
    19. Michael Pfarrhofer & Anna Stelzer, 2019. "High-frequency and heteroskedasticity identification in multicountry models: Revisiting spillovers of monetary shocks," Papers 1912.03158, arXiv.org, revised Dec 2024.
    20. Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.