IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p1944-d1420475.html
   My bibliography  Save this article

An Improved Expectation–Maximization Bayesian Algorithm for GWAS

Author

Listed:
  • Ganwen Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jianini Zhao

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China
    These authors contributed equally to this work.)

  • Jieru Wang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Guo Lin

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Lin Li

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Fengfei Ban

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Meiting Zhu

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Yangjun Wen

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

  • Jin Zhang

    (College of Science, Nanjing Agricultural University, Nanjing 210095, China)

Abstract

Genome-wide association studies (GWASs) are flexible and comprehensive tools for identifying single nucleotide polymorphisms (SNPs) associated with complex traits or diseases. The whole-genome Bayesian models are an effective way of incorporating important prior information into modeling. Bayesian methods have been widely used in association analysis. However, Bayesian analysis is often not feasible due to the high-throughput genotype and large sample sizes involved. In this study, we propose a new Bayesian algorithm under the mixed linear model framework: the expectation and maximization BayesB Improved algorithm (emBBI). The emBBI algorithm corrects polygenic and environmental noise and reduces dimensions; then, it estimates and tests marker effects using emBayesB and the LOD test, respectively. We conducted two simulation experiments and analyzed a real dataset related to flowering time in Arabidopsis to demonstrate the validation of the new algorithm. The results show that the emBBI algorithm is more flexible and accurate in simulation studies compared to established methods, and it performs well under complex genetic backgrounds. The analysis of the Arabidopsis real dataset further illustrates the advantages of the emBBI algorithm for GWAS by detecting known genes. Furthermore, 12 candidate genes are identified in the neighborhood of the significant quantitative trait nucleotides (QTNs) of flowering-related QTNs in Arabidopsis . In addition, we also performed enrichment analysis and tissue expression analysis of candidate genes, which will help us better understand the genetic basis of flowering-related traits in Arabidopsis .

Suggested Citation

  • Ganwen Zhang & Jianini Zhao & Jieru Wang & Guo Lin & Lin Li & Fengfei Ban & Meiting Zhu & Yangjun Wen & Jin Zhang, 2024. "An Improved Expectation–Maximization Bayesian Algorithm for GWAS," Mathematics, MDPI, vol. 12(13), pages 1-14, June.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/1944/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gerhard Moser & Sang Hong Lee & Ben J Hayes & Michael E Goddard & Naomi R Wray & Peter M Visscher, 2015. "Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model," PLOS Genetics, Public Library of Science, vol. 11(4), pages 1-22, April.
    2. Susanna Atwell & Yu S. Huang & Bjarni J. Vilhjálmsson & Glenda Willems & Matthew Horton & Yan Li & Dazhe Meng & Alexander Platt & Aaron M. Tarone & Tina T. Hu & Rong Jiang & N. Wayan Muliyati & Xu Zha, 2010. "Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines," Nature, Nature, vol. 465(7298), pages 627-631, June.
    3. Xiaolei Liu & Meng Huang & Bin Fan & Edward S Buckler & Zhiwu Zhang, 2016. "Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies," PLOS Genetics, Public Library of Science, vol. 12(2), pages 1-24, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cox Lwaka Tamba & Yuan-Li Ni & Yuan-Ming Zhang, 2017. "Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-20, January.
    2. Justin N. Vaughn & Sandra E. Branham & Brian Abernathy & Amanda M. Hulse-Kemp & Adam R. Rivers & Amnon Levi & William P. Wechter, 2022. "Graph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Zhanwei Zhuang & Shaoyun Li & Rongrong Ding & Ming Yang & Enqin Zheng & Huaqiang Yang & Ting Gu & Zheng Xu & Gengyuan Cai & Zhenfang Wu & Jie Yang, 2019. "Meta-analysis of genome-wide association studies for loin muscle area and loin muscle depth in two Duroc pig populations," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-21, June.
    4. Hideki Yoshida & Ko Hirano & Kenji Yano & Fanmiao Wang & Masaki Mori & Mayuko Kawamura & Eriko Koketsu & Masako Hattori & Reynante Lacsamana Ordonio & Peng Huang & Eiji Yamamoto & Makoto Matsuoka, 2022. "Genome-wide association study identifies a gene responsible for temperature-dependent rice germination," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    5. Guangbao Guo & Guoqi Qian & Lu Lin & Wei Shao, 2021. "Parallel inference for big data with the group Bayesian method," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 225-243, February.
    6. Gianola, Daniel & Fernando, Rohan L. & Schön, Chris-Carolin, 2020. "Inferring trait-specific similarity among individuals from molecular markers and phenotypes with Bayesian regression," Theoretical Population Biology, Elsevier, vol. 132(C), pages 47-59.
    7. Hanne De Kort & Sylvain Legrand & Olivier Honnay & James Buckley, 2022. "Transposable elements maintain genome-wide heterozygosity in inbred populations," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    8. Yuxuan Duan & Hongliang Zheng & Haoran Wen & Di Qu & Jingnan Cui & Chong Li & Jingguo Wang & Hualong Liu & Luomiao Yang & Yan Jia & Wei Xin & Shuangshuang Li & Detang Zou, 2022. "Identification of Candidate Genes for Salt Tolerance at the Germination Stage in Japonica Rice by Genome-Wide Association Analysis," Agriculture, MDPI, vol. 12(10), pages 1-15, October.
    9. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    10. Muhammad Saeed & Farhan Ullah & Liaqat Shah & Waqas Ahmad & Murad Ali & Fazal Munsif & Ahmad Zubair & Muhammad Ibrahim & Syed Mushtaq Ahmed Shah & Hammad Uddin & Chen Can & Si Hongqi & Ma Chuanxi, 2022. "Identification of Three Novel QTLs Associated with Yellow Rust Resistance in Wheat ( Triticum aestivum L.) Anong-179/Khaista-17 F 2 Population," Sustainability, MDPI, vol. 14(12), pages 1-15, June.
    11. Xiaojun Mao & Somak Dutta & Raymond K. W. Wong & Dan Nettleton, 2020. "Adjusting for Spatial Effects in Genomic Prediction," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(4), pages 699-718, December.
    12. M. S. Clark & J. I. Hoffman & L. S. Peck & L. Bargelloni & D. Gande & C. Havermans & B. Meyer & T. Patarnello & T. Phillips & K. R. Stoof-Leichsenring & D. L. J. Vendrami & A. Beck & G. Collins & M. W, 2023. "Multi-omics for studying and understanding polar life," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    13. Theo Meuwissen & Ben Hayes & Iona MacLeod & Michael Goddard, 2022. "Identification of Genomic Variants Causing Variation in Quantitative Traits: A Review," Agriculture, MDPI, vol. 12(10), pages 1-11, October.
    14. Ye, Mao & Zhang, Peng & Nie, Lizhen, 2018. "Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 32-49.
    15. Prabin Bajgain & James A. Anderson, 2021. "Multi-Allelic Haplotype-Based Association Analysis Identifies Genomic Regions Controlling Domestication Traits in Intermediate Wheatgrass," Agriculture, MDPI, vol. 11(7), pages 1-15, July.
    16. Xubin Lu & Hui Jiang & Abdelaziz Adam Idriss Arbab & Bo Wang & Dingding Liu & Ismail Mohamed Abdalla & Tianle Xu & Yujia Sun & Zongping Liu & Zhangping Yang, 2023. "Investigating Genetic Characteristics of Chinese Holstein Cow’s Milk Somatic Cell Score by Genetic Parameter Estimation and Genome-Wide Association," Agriculture, MDPI, vol. 13(2), pages 1-17, January.
    17. Yue Xin & Lina Gao & Wenming Hu & Qi Gao & Bin Yang & Jianguo Zhou & Cuilian Xu, 2022. "Genome-Wide Association Study Based on Plant Height and Drought-Tolerance Indices Reveals Two Candidate Drought-Tolerance Genes in Sweet Sorghum," Sustainability, MDPI, vol. 14(21), pages 1-14, November.
    18. Minghui Kang & Haolin Wu & Huanhuan Liu & Wenyu Liu & Mingjia Zhu & Yu Han & Wei Liu & Chunlin Chen & Yan Song & Luna Tan & Kangqun Yin & Yusen Zhao & Zhen Yan & Shangling Lou & Yanjun Zan & Jianquan , 2023. "The pan-genome and local adaptation of Arabidopsis thaliana," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    19. Uğur Sesiz, 2023. "Deciphering Genomic Regions and Putative Candidate Genes for Grain Size and Shape Traits in Durum Wheat through GWAS," Agriculture, MDPI, vol. 13(10), pages 1-17, September.
    20. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1944-:d:1420475. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.