IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p1944-d1420475.html
   My bibliography  Save this article

An Improved Expectation–Maximization Bayesian Algorithm for GWAS

Author

Listed:
  • Ganwen Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jianini Zhao

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jieru Wang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Guo Lin

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Lin Li

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Fengfei Ban

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Meiting Zhu

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Yangjun Wen

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Jin Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

Abstract

Genome-wide association studies (GWASs) are flexible and comprehensive tools for identifying single nucleotide polymorphisms (SNPs) associated with complex traits or diseases. The whole-genome Bayesian models are an effective way of incorporating important prior information into modeling. Bayesian methods have been widely used in association analysis. However, Bayesian analysis is often not feasible due to the high-throughput genotype and large sample sizes involved. In this study, we propose a new Bayesian algorithm under the mixed linear model framework: the expectation and maximization BayesB Improved algorithm (emBBI). The emBBI algorithm corrects polygenic and environmental noise and reduces dimensions; then, it estimates and tests marker effects using emBayesB and the LOD test, respectively. We conducted two simulation experiments and analyzed a real dataset related to flowering time in Arabidopsis to demonstrate the validation of the new algorithm. The results show that the emBBI algorithm is more flexible and accurate in simulation studies compared to established methods, and it performs well under complex genetic backgrounds. The analysis of the Arabidopsis real dataset further illustrates the advantages of the emBBI algorithm for GWAS by detecting known genes. Furthermore, 12 candidate genes are identified in the neighborhood of the significant quantitative trait nucleotides (QTNs) of flowering-related QTNs in Arabidopsis . In addition, we also performed enrichment analysis and tissue expression analysis of candidate genes, which will help us better understand the genetic basis of flowering-related traits in Arabidopsis .

Suggested Citation

  • Ganwen Zhang & Jianini Zhao & Jieru Wang & Guo Lin & Lin Li & Fengfei Ban & Meiting Zhu & Yangjun Wen & Jin Zhang, 2024. "An Improved Expectation–Maximization Bayesian Algorithm for GWAS," Mathematics, MDPI, vol. 12(13), pages 1-14, June.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gerhard Moser & Sang Hong Lee & Ben J Hayes & Michael E Goddard & Naomi R Wray & Peter M Visscher, 2015. "Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model," PLOS Genetics, Public Library of Science, vol. 11(4), pages 1-22, April.
    2. Susanna Atwell & Yu S. Huang & Bjarni J. Vilhjálmsson & Glenda Willems & Matthew Horton & Yan Li & Dazhe Meng & Alexander Platt & Aaron M. Tarone & Tina T. Hu & Rong Jiang & N. Wayan Muliyati & Xu Zha, 2010. "Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines," Nature, Nature, vol. 465(7298), pages 627-631, June.
    3. Xiaolei Liu & Meng Huang & Bin Fan & Edward S Buckler & Zhiwu Zhang, 2016. "Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies," PLOS Genetics, Public Library of Science, vol. 12(2), pages 1-24, February.
    4. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cox Lwaka Tamba & Yuan-Li Ni & Yuan-Ming Zhang, 2017. "Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-20, January.
    2. Gianola, Daniel & Fernando, Rohan L. & Schön, Chris-Carolin, 2020. "Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression," Theoretical Population Biology, Elsevier, vol. 132(C), pages 47-59.
    3. Niloy Biswas & Anirban Bhattacharya & Pierre E. Jacob & James E. Johndrow, 2022. "Coupling‐based convergence assessment of some Gibbs samplers for high‐dimensional Bayesian regression with shrinkage priors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 973-996, July.
    4. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    5. Anne Musson & Damien Rousselière, 2020. "Exploring the effect of crisis on cooperatives: a Bayesian performance analysis of French craftsmen cooperatives," Applied Economics, Taylor & Francis Journals, vol. 52(25), pages 2657-2678, May.
    6. Prüser, Jan, 2017. "Forecasting US inflation using Markov dimension switching," Ruhr Economic Papers 710, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
    7. Armagan, Artin & Dunson, David, 2011. "Sparse variational analysis of linear mixed models for large data sets," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1056-1062, August.
    8. Wang, Hong & Forbes, Catherine S. & Fenech, Jean-Pierre & Vaz, John, 2020. "The determinants of bank loan recovery rates in good times and bad – New evidence," Journal of Economic Behavior & Organization, Elsevier, vol. 177(C), pages 875-897.
    9. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    10. Kastner, Gregor, 2019. "Sparse Bayesian time-varying covariance estimation in many dimensions," Journal of Econometrics, Elsevier, vol. 210(1), pages 98-115.
    11. Justin N. Vaughn & Sandra E. Branham & Brian Abernathy & Amanda M. Hulse-Kemp & Adam R. Rivers & Amnon Levi & William P. Wechter, 2022. "Graph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    12. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    13. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    14. Eliaz, Kfir & Spiegler, Ran, 2022. "On incentive-compatible estimators," Games and Economic Behavior, Elsevier, vol. 132(C), pages 204-220.
    15. Theodore Panagiotidis & Georgios Papapanagiotou, 2024. "A note on the determinants of NFTs returns," Working Paper series 24-07, Rimini Centre for Economic Analysis.
    16. Ruixin Guo & Hongtu Zhu & Sy-Miin Chow & Joseph G. Ibrahim, 2012. "Bayesian Lasso for Semiparametric Structural Equation Models," Biometrics, The International Biometric Society, vol. 68(2), pages 567-577, June.
    17. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    18. Hai Anh Tran & Hyun Jo & Thi Cuc Nguyen & Jeong-Dong Lee & Hak Soo Seo & Jong Tae Song, 2024. "Genome-Wide Association Analysis for Submergence Tolerance at the Early Vegetative and Germination Stages in Wild Soybean ( Glycine soja )," Agriculture, MDPI, vol. 14(9), pages 1-17, September.
    19. Francesca Caselli & Matilde Faralli & Paolo Manasse & Ugo Panizza, 2021. "On the Benefits of Repaying," IMF Working Papers 2021/233, International Monetary Fund.
    20. Zhanwei Zhuang & Shaoyun Li & Rongrong Ding & Ming Yang & Enqin Zheng & Huaqiang Yang & Ting Gu & Zheng Xu & Gengyuan Cai & Zhenfang Wu & Jie Yang, 2019. "Meta-analysis of genome-wide association studies for loin muscle area and loin muscle depth in two Duroc pig populations," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-21, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.