IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-47357-7.html
   My bibliography  Save this article

An ensemble penalized regression method for multi-ancestry polygenic risk prediction

Author

Listed:
  • Jingning Zhang

    (Johns Hopkins Bloomberg School of Public Health)

  • Jianan Zhan

    (23andMe Inc.)

  • Jin Jin

    (University of Pennsylvania)

  • Cheng Ma

    (University of Michigan)

  • Ruzhang Zhao

    (Johns Hopkins Bloomberg School of Public Health)

  • Jared O’Connell

    (23andMe Inc.)

  • Yunxuan Jiang

    (23andMe Inc.)

  • Bertram L. Koelsch

    (23andMe Inc.)

  • Haoyu Zhang

    (National Cancer Institute)

  • Nilanjan Chatterjee

    (Johns Hopkins Bloomberg School of Public Health
    Johns Hopkins University)

Abstract

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

Suggested Citation

  • Jingning Zhang & Jianan Zhan & Jin Jin & Cheng Ma & Ruzhang Zhao & Jared O’Connell & Yunxuan Jiang & Bertram L. Koelsch & Haoyu Zhang & Nilanjan Chatterjee, 2024. "An ensemble penalized regression method for multi-ancestry polygenic risk prediction," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-47357-7
    DOI: 10.1038/s41467-024-47357-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-47357-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-47357-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tian Ge & Chia-Yen Chen & Yang Ni & Yen-Chen Anne Feng & Jordan W. Smoller, 2019. "Polygenic prediction via Bayesian regression and continuous shrinkage priors," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    2. Aniket Mishra & Rainer Malik & Tsuyoshi Hachiya & Tuuli Jürgenson & Shinichi Namba & Daniel C. Posner & Frederick K. Kamanu & Masaru Koido & Quentin Le Grand & Mingyang Shi & Yunye He & Marios K. Geor, 2022. "Stroke genetics informs drug discovery and risk prediction across ancestries," Nature, Nature, vol. 611(7934), pages 115-123, November.
    3. Aniket Mishra & Rainer Malik & Tsuyoshi Hachiya & Tuuli Jürgenson & Shinichi Namba & Daniel C. Posner & Frederick K. Kamanu & Masaru Koido & Quentin Le Grand & Mingyang Shi & Yunye He & Marios K. Geor, 2022. "Publisher Correction: Stroke genetics informs drug discovery and risk prediction across ancestries," Nature, Nature, vol. 612(7938), pages 7-7, December.
    4. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    5. Frank Dudbridge, 2013. "Power and Predictive Accuracy of Polygenic Risk Scores," PLOS Genetics, Public Library of Science, vol. 9(3), pages 1-17, March.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Genevieve L. Wojcik & Mariaelisa Graff & Katherine K. Nishimura & Ran Tao & Jeffrey Haessler & Christopher R. Gignoux & Heather M. Highland & Yesha M. Patel & Elena P. Sorokin & Christy L. Avery & Gil, 2019. "Genetic analyses of diverse populations improves discovery for complex traits," Nature, Nature, vol. 570(7762), pages 514-518, June.
    8. Robert Tibshirani & Michael Saunders & Saharon Rosset & Ji Zhu & Keith Knight, 2005. "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(1), pages 91-108, February.
    9. Alice B. Popejoy & Stephanie M. Fullerton, 2016. "Genomics is failing on diversity," Nature, Nature, vol. 538(7624), pages 161-164, October.
    10. Sarah E. Graham & Shoa L. Clarke & Kuan-Han H. Wu & Stavroula Kanoni & Greg J. M. Zajac & Shweta Ramdas & Ida Surakka & Ioanna Ntalla & Sailaja Vedantam & Thomas W. Winkler & Adam E. Locke & Eirini Ma, 2021. "The power of genetic diversity in genome-wide association studies of lipids," Nature, Nature, vol. 600(7890), pages 675-679, December.
    11. L. Duncan & H. Shen & B. Gelaye & J. Meijsen & K. Ressler & M. Feldman & R. Peterson & B. Domingue, 2019. "Analysis of polygenic risk score usage and performance in diverse human populations," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ananyo Choudhury & Jean-Tristan Brandenburg & Tinashe Chikowore & Dhriti Sengupta & Palwende Romuald Boua & Nigel J. Crowther & Godfred Agongo & Gershim Asiki & F. Xavier Gómez-Olivé & Isaac Kisiangan, 2022. "Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    4. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    5. Lu Tang & Ling Zhou & Peter X. K. Song, 2019. "Fusion learning algorithm to combine partially heterogeneous Cox models," Computational Statistics, Springer, vol. 34(1), pages 395-414, March.
    6. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Centofanti, Fabio & Fontana, Matteo & Lepore, Antonio & Vantini, Simone, 2022. "Smooth LASSO estimator for the Function-on-Function linear regression model," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    9. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    10. Laura Freijeiro‐González & Manuel Febrero‐Bande & Wenceslao González‐Manteiga, 2022. "A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates," International Statistical Review, International Statistical Institute, vol. 90(1), pages 118-145, April.
    11. Clara Albiñana & Zhihong Zhu & Andrew J. Schork & Andrés Ingason & Hugues Aschard & Isabell Brikell & Cynthia M. Bulik & Liselotte V. Petersen & Esben Agerbo & Jakob Grove & Merete Nordentoft & David , 2023. "Multi-PGS enhances polygenic prediction by combining 937 polygenic scores," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    12. Md Showaib Rahman Sarker & Michael Pokojovy & Sangjin Kim, 2019. "On the Performance of Variable Selection and Classification via Rank-Based Classifier," Mathematics, MDPI, vol. 7(5), pages 1-16, May.
    13. Armin Rauschenberger & Iuliana Ciocănea-Teodorescu & Marianne A. Jonker & Renée X. Menezes & Mark A. Wiel, 2020. "Sparse classification with paired covariates," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 571-588, September.
    14. Pereira, Rita & Biroli, Pietro & von hinke, stephanie & Van Kippersluis, Hans & Galama, Titus & Rietveld, Niels & Thom, Kevin, 2022. "Gene-Environment Interplay in the Social Sciences," OSF Preprints d96z3, Center for Open Science.
    15. Alesha A. Hatton & Fei-Fei Cheng & Tian Lin & Ren-Juan Shen & Jie Chen & Zhili Zheng & Jia Qu & Fan Lyu & Sarah E. Harris & Simon R. Cox & Zi-Bing Jin & Nicholas G. Martin & Dongsheng Fan & Grant W. M, 2024. "Genetic control of DNA methylation is largely shared across European and East Asian populations," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Benjamin G. Stokell & Rajen D. Shah & Ryan J. Tibshirani, 2021. "Modelling high‐dimensional categorical data using nonconvex fusion penalties," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 579-611, July.
    17. Brieuc Lehmann & Maxine Mackintosh & Gil McVean & Chris Holmes, 2023. "Optimal strategies for learning multi-ancestry polygenic scores vary across traits," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    18. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    19. Naimoli, Antonio, 2022. "Modelling the persistence of Covid-19 positivity rate in Italy," Socio-Economic Planning Sciences, Elsevier, vol. 82(PA).
    20. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2010. "Pairwise Variable Selection for High-Dimensional Model-Based Clustering," Biometrics, The International Biometric Society, vol. 66(3), pages 793-804, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-47357-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.