IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1006493.html
   My bibliography  Save this article

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data

Author

Listed:
  • Jianxin Shi
  • Ju-Hyun Park
  • Jubao Duan
  • Sonja T Berndt
  • Winton Moy
  • Kai Yu
  • Lei Song
  • William Wheeler
  • Xing Hua
  • Debra Silverman
  • Montserrat Garcia-Closas
  • Chao Agnes Hsiung
  • Jonine D Figueroa
  • Victoria K Cortessis
  • Núria Malats
  • Margaret R Karagas
  • Paolo Vineis
  • I-Shou Chang
  • Dongxin Lin
  • Baosen Zhou
  • Adeline Seow
  • Keitaro Matsuo
  • Yun-Chul Hong
  • Neil E Caporaso
  • Brian Wolpin
  • Eric Jacobs
  • Gloria M Petersen
  • Alison P Klein
  • Donghui Li
  • Harvey Risch
  • Alan R Sanders
  • Li Hsu
  • Robert E Schoen
  • Hermann Brenner
  • MGS (Molecular Genetics of Schizophrenia) GWAS Consortium
  • GECCO (The Genetics and Epidemiology of Colorectal Cancer Consortium)
  • The GAME-ON/TRICL (Transdisciplinary Research in Cancer of the Lung) GWAS Consortium
  • PRACTICAL (PRostate cancer AssoCiation group To Investigate Cancer Associated aLterations) Consortium
  • PanScan Consortium
  • The GAME-ON/ELLIPSE Consortium
  • Rachael Stolzenberg-Solomon
  • Pablo Gejman
  • Qing Lan
  • Nathaniel Rothman
  • Laufey T Amundadottir
  • Maria Teresa Landi
  • Douglas F Levinson
  • Stephen J Chanock
  • Nilanjan Chatterjee

Abstract

Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner’s-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner’s curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner’s curse correction improved prediction R2 from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.Author Summary: Large GWAS have identified tens or even hundreds of common SNPs significantly associated with individual complex diseases; however, these SNPs typically explain a small proportion of phenotypic variance. Recently, heritability analyses based on GWAS data suggest that common SNPs have the potential to explain substantially larger fraction of phenotypic variance and to improve the genetic risk prediction. Because of the polygenic nature, improving genetic risk prediction for complex diseases typically requires substantially increasing the sample size in the discovery set. Thus, it is crucial to develop more efficient algorithms using existing GWAS summary data. In this article, we extend the polygenic risk score (PRS) method by adjusting the marginal effect size of SNPs for winner’s curse and by incorporating external functional annotation data. Theoretical analysis and simulation studies show that the performance improvement depends on the genetic architecture of the trait, sample size of the discovery sample set and the degree of enrichment of association for SNPs annotated as “high-prior” and the linkage disequilibrium patterns of these SNPs. We applied our method to the summary data of 14 GWAS. Our method achieved 25–50% gain in efficiency (measured in the prediction R2) for 5 of 14 diseases compared to the standard PRS.

Suggested Citation

  • Jianxin Shi & Ju-Hyun Park & Jubao Duan & Sonja T Berndt & Winton Moy & Kai Yu & Lei Song & William Wheeler & Xing Hua & Debra Silverman & Montserrat Garcia-Closas & Chao Agnes Hsiung & Jonine D Figue, 2016. "Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data," PLOS Genetics, Public Library of Science, vol. 12(12), pages 1-24, December.
  • Handle: RePEc:plo:pgen00:1006493
    DOI: 10.1371/journal.pgen.1006493
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006493
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1006493&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1006493?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1006493. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.