IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-024-55636-6.html
   My bibliography  Save this article

Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages

Author

Listed:
  • Chen Wang

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Havell Markus

    (College of Medicine, Penn State University)

  • Avantika R. Diwadkar

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Chachrit Khunsriraksakul

    (College of Medicine, Penn State University)

  • Laura Carrel

    (College of Medicine, Penn State University)

  • Bingshan Li

    (Vanderbilt University)

  • Xue Zhong

    (Division of Genetic Medicine, Vanderbilt University Medical Center)

  • Xingyan Wang

    (College of Medicine, Penn State University)

  • Xiaowei Zhan

    (Southern Methodist University
    Southwestern Medical Center University of Texas
    Southwestern Medical Center University of Texas)

  • Galen T. Foulke

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Nancy J. Olsen

    (College of Medicine, Penn State University)

  • Dajiang J. Liu

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Bibo Jiang

    (College of Medicine, Penn State University)

Abstract

Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction $${R}^{2}$$ R 2 and the resulting PRS yields the strongest correlation with progression prevalence.

Suggested Citation

  • Chen Wang & Havell Markus & Avantika R. Diwadkar & Chachrit Khunsriraksakul & Laura Carrel & Bingshan Li & Xue Zhong & Xingyan Wang & Xiaowei Zhan & Galen T. Foulke & Nancy J. Olsen & Dajiang J. Liu &, 2025. "Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6
    DOI: 10.1038/s41467-024-55636-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-55636-6
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-55636-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Quan Sun & Bryce T. Rowland & Jiawen Chen & Anna V. Mikhaylova & Christy Avery & Ulrike Peters & Jessica Lundin & Tara Matise & Steve Buyske & Ran Tao & Rasika A. Mathias & Alexander P. Reiner & Paul , 2024. "Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Claudia Giambartolomei & Damjan Vukcevic & Eric E Schadt & Lude Franke & Aroon D Hingorani & Chris Wallace & Vincent Plagnol, 2014. "Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics," PLOS Genetics, Public Library of Science, vol. 10(5), pages 1-15, May.
    3. Tian Ge & Chia-Yen Chen & Yang Ni & Yen-Chen Anne Feng & Jordan W. Smoller, 2019. "Polygenic prediction via Bayesian regression and continuous shrinkage priors," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    4. Jingning Zhang & Jianan Zhan & Jin Jin & Cheng Ma & Ruzhang Zhao & Jared O’Connell & Yunxuan Jiang & Bertram L. Koelsch & Haoyu Zhang & Nilanjan Chatterjee, 2024. "An ensemble penalized regression method for multi-ancestry polygenic risk prediction," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    5. Yu Jiang & Sai Chen & Daniel McGuire & Fang Chen & Mengzhen Liu & William G Iacono & John K Hewitt & John E Hokanson & Kenneth Krauter & Markku Laakso & Kevin W Li & Sharon M Lutz & Matthew McGue & An, 2018. "Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes," PLOS Genetics, Public Library of Science, vol. 14(7), pages 1-19, July.
    6. Chachrit Khunsriraksakul & Qinmengge Li & Havell Markus & Matthew T. Patrick & Renan Sauteraud & Daniel McGuire & Xingyan Wang & Chen Wang & Lida Wang & Siyuan Chen & Ganesh Shenoy & Bingshan Li & Xue, 2023. "Multi-ancestry and multi-trait genome-wide association meta-analyses inform clinical risk prediction for systemic lupus erythematosus," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Magdalena Zimoń & Yunfeng Huang & Anthi Trasta & Aliaksandr Halavatyi & Jimmy Z. Liu & Chia-Yen Chen & Peter Blattmann & Bernd Klaus & Christopher D. Whelan & David Sexton & Sally John & Wolfgang Hube, 2021. "Pairwise effects between lipid GWAS genes modulate lipid plasma levels and cellular uptake," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    2. Danielle Rasooly & Gina M. Peloso & Alexandre C. Pereira & Hesam Dashti & Claudia Giambartolomei & Eleanor Wheeler & Nay Aung & Brian R. Ferolito & Maik Pietzner & Eric H. Farber-Eger & Quinn Stanton , 2023. "Genome-wide association analysis and Mendelian randomization proteomics identify drug targets for heart failure," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Priya Gupta & Marco Galimberti & Yue Liu & Sarah Beck & Aliza Wingo & Thomas Wingo & Keyrun Adhikari & Henry R. Kranzler & Murray B. Stein & Joel Gelernter & Daniel F. Levey, 2024. "A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology," Nature Human Behaviour, Nature, vol. 8(11), pages 2235-2249, November.
    4. Bryan R. Gorman & Sun-Gou Ji & Michael Francis & Anoop K. Sendamarai & Yunling Shi & Poornima Devineni & Uma Saxena & Elizabeth Partan & Andrea K. DeVito & Jinyoung Byun & Younghun Han & Xiangjun Xiao, 2024. "Multi-ancestry GWAS meta-analyses of lung cancer reveal susceptibility loci and elucidate smoking-independent genetic risk," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    5. Jacob Joseph & Chang Liu & Qin Hui & Krishna Aragam & Zeyuan Wang & Brian Charest & Jennifer E. Huffman & Jacob M. Keaton & Todd L. Edwards & Serkalem Demissie & Luc Djousse & Juan P. Casas & J. Micha, 2022. "Genetic architecture of heart failure with preserved versus reduced ejection fraction," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    6. Natalie DeForest & Yuqi Wang & Zhiyi Zhu & Jacqueline S. Dron & Ryan Koesterer & Pradeep Natarajan & Jason Flannick & Tiffany Amariuta & Gina M. Peloso & Amit R. Majithia, 2024. "Genome-wide discovery and integrative genomic characterization of insulin resistance loci using serum triglycerides to HDL-cholesterol ratio as a proxy," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    7. Julia Schröder & Vitalia Schüller & Andrea May & Christian Gerges & Mario Anders & Jessica Becker & Timo Hess & Nicole Kreuser & René Thieme & Kerstin U Ludwig & Tania Noder & Marino Venerito & Lothar, 2019. "Identification of loci of functional relevance to Barrett’s esophagus and esophageal adenocarcinoma: Cross-referencing of expression quantitative trait loci data from disease-relevant tissues with gen," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-12, December.
    8. Lili Liu & Atlas Khan & Elena Sanchez-Rodriguez & Francesca Zanoni & Yifu Li & Nicholas Steers & Olivia Balderes & Junying Zhang & Priya Krithivasan & Robert A. LeDesma & Clara Fischman & Scott J. Heb, 2022. "Genetic regulation of serum IgA levels and susceptibility to common immune, infectious, kidney, and cardio-metabolic traits," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    9. Marta Alcalde-Herraiz & JunQing Xie & Danielle Newby & Clara Prats & Dipender Gill & María Gordillo-Marañón & Daniel Prieto-Alhambra & Martí Català & Albert Prats-Uribe, 2024. "Effect of genetically predicted sclerostin on cardiovascular biomarkers, risk factors, and disease outcomes," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    10. Sylvia Hartmann & Summaira Yasmeen & Benjamin M. Jacobs & Spiros Denaxas & Munir Pirmohamed & Eric R. Gamazon & Mark J. Caulfield & Harry Hemingway & Maik Pietzner & Claudia Langenberg, 2023. "ADRA2A and IRX1 are putative risk genes for Raynaud’s phenomenon," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    11. Brittany L. Mitchell & Jake R. Saklatvala & Nick Dand & Fiona A. Hagenbeek & Xin Li & Josine L. Min & Laurent Thomas & Meike Bartels & Jouke Hottenga & Michelle K. Lupton & Dorret I. Boomsma & Xianjun, 2022. "Genome-wide association meta-analysis identifies 29 new acne susceptibility loci," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    12. Elizabeth C. Goode & Laura Fachal & Nikolaos Panousis & Loukas Moutsianas & Rebecca E. McIntyre & Benjamin Yu Hang Bai & Norihito Kawasaki & Alexandra Wittmann & Tim Raine & Simon M. Rushbrook & Carl , 2024. "Fine-mapping and molecular characterisation of primary sclerosing cholangitis genetic risk loci," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    13. Zichen Zhang & Ye Eun Bae & Jonathan R. Bradley & Lang Wu & Chong Wu, 2022. "SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    14. Pietro Demela & Nicola Pirastu & Blagoje Soskic, 2023. "Cross-disorder genetic analysis of immune diseases reveals distinct gene associations that converge on common pathways," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    15. Alamoodi, A.H. & Zaidan, B.B. & Zaidan, A.A. & Albahri, O.S. & Chen, Juliana & Chyad, M.A. & Garfan, Salem & Aleesa, A.M., 2021. "Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).
    16. Zhaotong Lin & Wei Pan, 2024. "A robust cis-Mendelian randomization method with application to drug target discovery," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Yuandan Wei & Jianxin Zhen & Liang Hu & Yuqin Gu & Yanhong Liu & Xinxin Guo & Zijing Yang & Hao Zheng & Shiyao Cheng & Fengxiang Wei & Likuan Xiong & Siyang Liu, 2024. "Genome-wide association studies of thyroid-related hormones, dysfunction, and autoimmunity among 85,421 Chinese pregnancies," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    18. Yash Patel & Jean Shin & Eeva Sliz & Ariana Tang & Aniket Mishra & Rui Xia & Edith Hofer & Hema Sekhar Reddy Rajula & Ruiqi Wang & Frauke Beyer & Katrin Horn & Max Riedl & Jing Yu & Henry Völzke & Rob, 2024. "Genetic risk factors underlying white matter hyperintensities and cortical atrophy," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    19. Jun Inamo & Akari Suzuki & Mahoko Takahashi Ueda & Kensuke Yamaguchi & Hiroshi Nishida & Katsuya Suzuki & Yuko Kaneko & Tsutomu Takeuchi & Hiroaki Hatano & Kazuyoshi Ishigaki & Yasushi Ishihama & Kazu, 2024. "Long-read sequencing for 29 immune cell subsets reveals disease-linked isoforms," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    20. Xingjie Hao & Zhonghe Shao & Ning Zhang & Minghui Jiang & Xi Cao & Si Li & Yunlong Guan & Chaolong Wang, 2023. "Integrative genome-wide analyses identify novel loci associated with kidney stones and provide insights into its genetic architecture," Nature Communications, Nature, vol. 14(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.