Author
Listed:
- Jiajun Qiu
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Yao Hu
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Li Li
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Abdullah Mesut Erzurumluoglu
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Ingrid Braenne
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Charles Whitehurst
(Boehringer-Ingelheim)
- Jochen Schmitz
(Boehringer-Ingelheim)
- Jatin Arora
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Boris Alexander Bartholdy
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Shrey Gandhi
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Pierre Khoueiry
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Stefanie Mueller
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Boris Noyvert
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Zhihao Ding
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Jan Nygaard Jensen
(Boehringer Ingelheim Pharma GmbH & Co. KG)
- Johann Jong
(Boehringer Ingelheim Pharma GmbH & Co. KG)
Abstract
Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn’s disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn’s disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
Suggested Citation
Jiajun Qiu & Yao Hu & Li Li & Abdullah Mesut Erzurumluoglu & Ingrid Braenne & Charles Whitehurst & Jochen Schmitz & Jatin Arora & Boris Alexander Bartholdy & Shrey Gandhi & Pierre Khoueiry & Stefanie , 2025.
"Deep representation learning for clustering longitudinal survival data from electronic health records,"
Nature Communications, Nature, vol. 16(1), pages 1-14, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z
DOI: 10.1038/s41467-025-56625-z
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.