IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-024-55636-6.html
   My bibliography  Save this article

Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages

Author

Listed:
  • Chen Wang

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Havell Markus

    (College of Medicine, Penn State University)

  • Avantika R. Diwadkar

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Chachrit Khunsriraksakul

    (College of Medicine, Penn State University)

  • Laura Carrel

    (College of Medicine, Penn State University)

  • Bingshan Li

    (Vanderbilt University)

  • Xue Zhong

    (Division of Genetic Medicine, Vanderbilt University Medical Center)

  • Xingyan Wang

    (College of Medicine, Penn State University)

  • Xiaowei Zhan

    (Southern Methodist University
    Southwestern Medical Center University of Texas
    Southwestern Medical Center University of Texas)

  • Galen T. Foulke

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Nancy J. Olsen

    (College of Medicine, Penn State University)

  • Dajiang J. Liu

    (College of Medicine, Penn State University
    College of Medicine, Penn State University)

  • Bibo Jiang

    (College of Medicine, Penn State University)

Abstract

Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction $${R}^{2}$$ R 2 and the resulting PRS yields the strongest correlation with progression prevalence.

Suggested Citation

  • Chen Wang & Havell Markus & Avantika R. Diwadkar & Chachrit Khunsriraksakul & Laura Carrel & Bingshan Li & Xue Zhong & Xingyan Wang & Xiaowei Zhan & Galen T. Foulke & Nancy J. Olsen & Dajiang J. Liu &, 2025. "Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6
    DOI: 10.1038/s41467-024-55636-6
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-55636-6
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-55636-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.