IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005589.html
   My bibliography  Save this article

Leveraging functional annotations in genetic risk prediction for human complex diseases

Author

Listed:
  • Yiming Hu
  • Qiongshi Lu
  • Ryan Powles
  • Xinwei Yao
  • Can Yang
  • Fang Fang
  • Xinran Xu
  • Hongyu Zhao

Abstract

Genetic risk prediction is an important goal in human genetics research and precision medicine. Accurate prediction models will have great impacts on both disease prevention and early treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome wide association studies (GWAS), genetic risk prediction accuracy remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes in the presence of linkage disequilibrium. In this paper, we introduce AnnoPred, a principled framework that leverages diverse types of genomic and epigenomic functional annotations in genetic risk prediction for complex diseases. AnnoPred is trained using GWAS summary statistics in a Bayesian framework in which we explicitly model various functional annotations and allow for linkage disequilibrium estimated from reference genotype data. Compared with state-of-the-art risk prediction methods, AnnoPred achieves consistently improved prediction accuracy in both extensive simulations and real data.Author summary: Genetic risk prediction plays a significant role in precision medicine. Accurate prediction models could have great impact on disease prevention and early treatment strategies. For example, mutations in BRCA1 and BRCA2 have been used to evaluate women’s breast cancer risk and as a guideline for early screening. However, genetic risk prediction models also present important challenges, including extreme high-dimensionality, limited access to and efficient computational methods for individual-level genotype data. To make use of rich GWAS summary statistics, we propose a novel method to address these challenges by integrating genomic functional annotations, which have been successfully applied in GWAS to generate biological insights. We demonstrate the improvement in accuracy in both extensive simulation studies and real data analysis of breast cancer, Crohn’s disease, celiac disease, rheumatoid arthritis and type-II diabetes.

Suggested Citation

  • Yiming Hu & Qiongshi Lu & Ryan Powles & Xinwei Yao & Can Yang & Fang Fang & Xinran Xu & Hongyu Zhao, 2017. "Leveraging functional annotations in genetic risk prediction for human complex diseases," PLOS Computational Biology, Public Library of Science, vol. 13(6), pages 1-16, June.
  • Handle: RePEc:plo:pcbi00:1005589
    DOI: 10.1371/journal.pcbi.1005589
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005589
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005589&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005589?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Xiang Zhou & Peter Carbonetto & Matthew Stephens, 2013. "Polygenic Modeling with Bayesian Sparse Linear Mixed Models," PLOS Genetics, Public Library of Science, vol. 9(2), pages 1-14, February.
    2. Jessica Minnier & Ming Yuan & Jun S. Liu & Tianxi Cai, 2015. "Risk Classification With an Adaptive Naive Bayes Kernel Machine Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 393-404, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Rita Dias Pereira & Pietro Biroli & Titus Galama & Stephanie von Hinke & Hans van Kippersluis & Cornelius A. Rietveld & Kevin Thom, 2022. "Gene-Environment Interplay in the Social Sciences," Papers 2203.02198, arXiv.org, revised Aug 2022.
    2. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Md. Moksedul Momin & Jisu Shin & Soohyun Lee & Buu Truong & Beben Benyamin & S. Hong Lee, 2023. "A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    4. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    5. Shuang Song & Wei Jiang & Lin Hou & Hongyu Zhao, 2020. "Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 16(2), pages 1-18, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dominic Holland & Oleksandr Frei & Rahul Desikan & Chun-Chieh Fan & Alexey A Shadrin & Olav B Smeland & V S Sundar & Paul Thompson & Ole A Andreassen & Anders M Dale, 2020. "Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-30, May.
    2. Rafael Blanquero & Emilio Carrizosa & Pepa Ramírez-Cobo & M. Remedios Sillero-Denamiel, 2022. "Constrained Naïve Bayes with application to unbalanced data classification," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 30(4), pages 1403-1425, December.
    3. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    4. Yanyi Song & Xiang Zhou & Min Zhang & Wei Zhao & Yongmei Liu & Sharon L. R. Kardia & Ana V. Diez Roux & Belinda L. Needham & Jennifer A. Smith & Bhramar Mukherjee, 2020. "Bayesian shrinkage estimation of high dimensional causal mediation effects in omics studies," Biometrics, The International Biometric Society, vol. 76(3), pages 700-710, September.
    5. Sujata Dash & Ajith Abraham & Ashish Kr Luhach & Jolanta Mizera-Pietraszko & Joel JPC Rodrigues, 2020. "Hybrid chaotic firefly decision making model for Parkinson’s disease diagnosis," International Journal of Distributed Sensor Networks, , vol. 16(1), pages 15501477198, January.
    6. Hui Li & Rahul Mazumder & Xihong Lin, 2023. "Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    7. Islam Shofiqul & Anand Sonia & Hamid Jemila & Thabane Lehana & Beyene Joseph, 2017. "Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(3), pages 199-216, August.
    8. McMahan Christopher & Baurley James & Bridges William & Joyner Chase & Kacamarga Muhamad Fitra & Lund Robert & Pardamean Carissa & Pardamean Bens, 2017. "A Bayesian hierarchical model for identifying significant polygenic effects while controlling for confounding and repeated measures," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 407-419, December.
    9. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    10. Heather E Wheeler & Kaanan P Shah & Jonathon Brenner & Tzintzuni Garcia & Keston Aquino-Michaels & GTEx Consortium & Nancy J Cox & Dan L Nicolae & Hae Kyung Im, 2016. "Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues," PLOS Genetics, Public Library of Science, vol. 12(11), pages 1-23, November.
    11. Lulu Shang & Wei Zhao & Yi Zhe Wang & Zheng Li & Jerome J. Choi & Minjung Kho & Thomas H. Mosley & Sharon L. R. Kardia & Jennifer A. Smith & Xiang Zhou, 2023. "meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    12. Abrahamsen, Tavis & Hobert, James P., 2019. "Fast Monte Carlo Markov chains for Bayesian shrinkage models with random effects," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 61-80.
    13. Cristina C. Bastias & Aurélien Estarague & Denis Vile & Elza Gaignon & Cheng-Ruei Lee & Moises Exposito-Alonso & Cyrille Violle & François Vasseur, 2024. "Ecological trade-offs drive phenotypic and genetic differentiation of Arabidopsis thaliana in Europe," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    14. Niloy Biswas & Anirban Bhattacharya & Pierre E. Jacob & James E. Johndrow, 2022. "Coupling‐based convergence assessment of some Gibbs samplers for high‐dimensional Bayesian regression with shrinkage priors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 973-996, July.
    15. Saikat Banerjee & Lingyao Zeng & Heribert Schunkert & Johannes Söding, 2018. "Bayesian multiple logistic regression for case-control GWAS," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-27, December.
    16. Brieuc Lehmann & Maxine Mackintosh & Gil McVean & Chris Holmes, 2023. "Optimal strategies for learning multi-ancestry polygenic scores vary across traits," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    17. Yiming Hu & Qiongshi Lu & Wei Liu & Yuhua Zhang & Mo Li & Hongyu Zhao, 2017. "Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction," PLOS Genetics, Public Library of Science, vol. 13(6), pages 1-22, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005589. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.