IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1009021.html
   My bibliography  Save this article

Evaluation of polygenic prediction methodology within a reference-standardized framework

Author

Listed:
  • Oliver Pain
  • Kylie P Glanville
  • Saskia P Hagenaars
  • Saskia Selzam
  • Anna E Fürtjes
  • Héléna A Gaspar
  • Jonathan R I Coleman
  • Kaili Rimfeld
  • Gerome Breen
  • Robert Plomin
  • Lasse Folkersen
  • Cathryn M Lewis

Abstract

The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value threshold and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16–18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.Author summary: An individual’s genetic predisposition to a given outcome can be summarized using polygenic scores. Polygenic scores are widely used in research and could also be used in a clinical setting to enhance personalized medicine. A range of methods have been developed for calculating polygenic scores, but it is unclear which methods are the best. Several methods provide multiple polygenic scores for each individual which must then be tested in an independent tuning sample to identify which polygenic score is most accurate. Other methods provide a single polygenic score and therefore do not require a tuning sample. Our study compares the prediction accuracy of eight leading polygenic scoring methods in a range of contexts. For methods that calculate multiple polygenic scores, we find that LDpred2, lassosum, and PRScs methods perform best on average. For methods that provide a single polygenic score, not requiring a tuning sample, we find PRScs performs best, and the faster DBSLMM and SBayesR methods also perform well. Our study has provided a comprehensive comparison of polygenic scoring methods that will guide future implementation of polygenic scores in both research and clinical settings.

Suggested Citation

  • Oliver Pain & Kylie P Glanville & Saskia P Hagenaars & Saskia Selzam & Anna E Fürtjes & Héléna A Gaspar & Jonathan R I Coleman & Kaili Rimfeld & Gerome Breen & Robert Plomin & Lasse Folkersen & Cathry, 2021. "Evaluation of polygenic prediction methodology within a reference-standardized framework," PLOS Genetics, Public Library of Science, vol. 17(5), pages 1-22, May.
  • Handle: RePEc:plo:pgen00:1009021
    DOI: 10.1371/journal.pgen.1009021
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009021
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1009021&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1009021?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Frank Dudbridge, 2013. "Power and Predictive Accuracy of Polygenic Risk Scores," PLOS Genetics, Public Library of Science, vol. 9(3), pages 1-17, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kenichi Yamamoto & Kyuto Sonehara & Shinichi Namba & Takahiro Konuma & Hironori Masuko & Satoru Miyawaki & Yoichiro Kamatani & Nobuyuki Hizawa & Keiichi Ozono & Loic Yengo & Yukinori Okada, 2023. "Genetic footprints of assortative mating in the Japanese population," Nature Human Behaviour, Nature, vol. 7(1), pages 65-73, January.
    2. Rodrigo R. R. Duarte & Oliver Pain & Matthew L. Bendall & Miguel Mulder Rougvie & Jez L. Marston & Sashika Selvackadunco & Claire Troakes & Szi Kay Leung & Rosemary A. Bamford & Jonathan Mill & Paul F, 2024. "Integrating human endogenous retroviruses into transcriptome-wide association studies highlights novel risk factors for major psychiatric conditions," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    3. Bradley Jermy & Kristi Läll & Brooke N. Wolford & Ying Wang & Kristina Zguro & Yipeng Cheng & Masahiro Kanai & Stavroula Kanoni & Zhiyu Yang & Tuomo Hartonen & Remo Monti & Julian Wanner & Omar Yousse, 2024. "A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Remo Monti & Pia Rautenstrauch & Mahsa Ghanbari & Alva Rani James & Matthias Kirchler & Uwe Ohler & Stefan Konigorski & Christoph Lippert, 2022. "Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mitchell, Brittany L. & Hansell, Narelle K. & McAloney, Kerrie & Martin, Nicholas G. & Wright, Margaret J. & Renteria, Miguel E. & Grasby, Katrina L., 2022. "Polygenic influences associated with adolescent cognitive skills," Intelligence, Elsevier, vol. 94(C).
    2. George B. Busby & Scott Kulm & Alessandro Bolli & Jen Kintzle & Paolo Di Domenico & Giordano Bottà, 2023. "Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    3. Xu, Yilan & Briley, Daniel A. & Brown, Jeffrey R. & Roberts, Brent W., 2017. "Genetic and environmental influences on household financial distress," Journal of Economic Behavior & Organization, Elsevier, vol. 142(C), pages 404-424.
    4. Joey Ward & Nicholas Graham & Rona J Strawbridge & Amy Ferguson & Gregory Jenkins & Wenan Chen & Karen Hodgson & Mark Frye & Richard Weinshilboum & Rudolf Uher & Cathryn M Lewis & Joanna Biernacka & D, 2018. "Polygenic risk scores for major depressive disorder and neuroticism as predictors of antidepressant response: Meta-analysis of three treatment cohorts," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-8, September.
    5. Bingxin Zhao & Fei Zou, 2022. "On polygenic risk scores for complex traits prediction," Biometrics, The International Biometric Society, vol. 78(2), pages 499-511, June.
    6. Chabris, C. F. & Lee, J. J. & Cesarini, D. & Benjamin, D. J. & Laibson, David I., 2015. "The Fourth Law of Behavior Genetics," Scholarly Articles 30780203, Harvard University Department of Economics.
    7. John Beshears & James J. Choi & David Laibson & Brigitte C. Madrian & Katherine L. Milkman, 2015. "The Effect of Providing Peer Information on Retirement Savings Decisions," Journal of Finance, American Finance Association, vol. 70(3), pages 1161-1201, June.
    8. Trejo, Sam, 2020. "Exploring Genetic Influences on Birth Weight," SocArXiv 7j59q, Center for Open Science.
    9. Paul Hufe & Andreas Peichl, 2020. "Beyond Equal Rights: Equality of Opportunity in Political Participation," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 66(3), pages 477-511, September.
    10. Nicos Nicolaou & Phillip H. Phan & Ute Stephan, 2021. "The Biological Perspective in Entrepreneurship Research," Entrepreneurship Theory and Practice, , vol. 45(1), pages 3-17, January.
    11. Sihai Dave Zhao, 2017. "Integrative genetic risk prediction using non-parametric empirical Bayes classification," Biometrics, The International Biometric Society, vol. 73(2), pages 582-592, June.
    12. Lauren Gaydosh & Daniel W. Belsky & Benjamin W. Domingue & Jason D. Boardman & Kathleen Mullan Harris, 2018. "Father Absence and Accelerated Reproductive Development in Non-Hispanic White Women in the United States," Demography, Springer;Population Association of America (PAA), vol. 55(4), pages 1245-1267, August.
    13. Andrea G Allegrini & Ville Karhunen & Jonathan R I Coleman & Saskia Selzam & Kaili Rimfeld & Sophie von Stumm & Jean-Baptiste Pingault & Robert Plomin, 2020. "Multivariable G-E interplay in the prediction of educational achievement," PLOS Genetics, Public Library of Science, vol. 16(11), pages 1-20, November.
    14. Liang, Liang & Ma, Yanyuan & Carroll, Raymond J., 2019. "A semiparametric efficient estimator in case-control studies for gene–environment independent models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 38-50.
    15. Tianying Wang & Alex Asher, 2021. "Improved Semiparametric Analysis of Polygenic Gene–Environment Interactions in Case–Control Studies," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 386-401, December.
    16. Yi Zeng & Huashuai Chen & Xiaomin Liu & Rui Ye & Enjun Xie & Zhihua Chen & Jiehua Lu & Jianxin Li & Yaohua Tian & Ting Ni & Lars Bolund & Kenneth C. Land & Anatoliy Yashin & Angela M. O'Rand & Liang S, 2017. "Sex differences in genetic associations with longevity in Han Chinese: sex-stratified genome-wide association study and polygenic risk score analysis," MPIDR Working Papers WP-2017-004, Max Planck Institute for Demographic Research, Rostock, Germany.
    17. Claudia Wigmann & Anke Hüls & Jean Krutmann & Tamara Schikowski, 2022. "Estimating the Relative Contribution of Environmental and Genetic Risk Factors to Different Aging Traits by Combining Correlated Variables into Weighted Risk Scores," IJERPH, MDPI, vol. 19(24), pages 1-13, December.
    18. Fasil Tekola-Ayele & Cuilin Zhang & Jing Wu & Katherine L Grantz & Mohammad L Rahman & Deepika Shrestha & Marion Ouidir & Tsegaselassie Workalemahu & Michael Y Tsai, 2020. "Trans-ethnic meta-analysis of genome-wide association studies identifies maternal ITPR1 as a novel locus influencing fetal growth during sensitive periods in pregnancy," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-20, May.
    19. Bingxin Zhao & Fei Zou & Hongtu Zhu, 2023. "Cross‐trait prediction accuracy of summary statistics in genome‐wide association studies," Biometrics, The International Biometric Society, vol. 79(2), pages 841-853, June.
    20. Wei Jiang & Ling Chen & Matthew J. Girgenti & Hongyu Zhao, 2024. "Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1009021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.