IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-35295-1.html
   My bibliography  Save this article

A Multifaceted benchmarking of synthetic electronic health record generation models

Author

Listed:
  • Chao Yan

    (Vanderbilt University Medical Center)

  • Yao Yan

    (Sage Bionetworks)

  • Zhiyu Wan

    (Vanderbilt University Medical Center)

  • Ziqi Zhang

    (Vanderbilt University)

  • Larsson Omberg

    (Sage Bionetworks)

  • Justin Guinney

    (University of Washington
    Tempus Labs)

  • Sean D. Mooney

    (University of Washington)

  • Bradley A. Malin

    (Vanderbilt University Medical Center
    Vanderbilt University
    Vanderbilt University Medical Center)

Abstract

Synthetic health data have the potential to mitigate privacy concerns in supporting biomedical research and healthcare applications. Modern approaches for data generation continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a systematic benchmarking framework to appraise key characteristics with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic health data and further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.

Suggested Citation

  • Chao Yan & Yao Yan & Zhiyu Wan & Ziqi Zhang & Larsson Omberg & Justin Guinney & Sean D. Mooney & Bradley A. Malin, 2022. "A Multifaceted benchmarking of synthetic electronic health record generation models," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35295-1
    DOI: 10.1038/s41467-022-35295-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-35295-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-35295-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chunhui Yuan & Haitao Yang, 2019. "Research on K-Value Selection Method of K-Means Clustering Algorithm," J, MDPI, vol. 2(2), pages 1-10, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qi Chang & Zhennan Yan & Mu Zhou & Hui Qu & Xiaoxiao He & Han Zhang & Lohendran Baskaran & Subhi Al’Aref & Hongsheng Li & Shaoting Zhang & Dimitris N. Metaxas, 2023. "Mining multi-center heterogeneous medical data with distributed synthetic learning," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    2. Brandon Theodorou & Cao Xiao & Jimeng Sun, 2023. "Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yunhwan Kim, 2023. "Exploring Organizational Self-(re)presentations on Visual Social Media: Computational Analysis of Startups’ Instagram Photos Based on Unsupervised Learning," SAGE Open, , vol. 13(4), pages 21582440231, December.
    2. Heller, Yuval & Tubul, Itay, 2023. "Strategies in the repeated prisoner’s dilemma: A cluster analysis," MPRA Paper 117444, University Library of Munich, Germany.
    3. Zhang, Yanquan & Chang, Ruidong & Zuo, Jian & Shabunko, Veronika & Zheng, Xian, 2023. "Regional disparity of residential solar panel diffusion in Australia: The roles of socio-economic factors," Renewable Energy, Elsevier, vol. 206(C), pages 808-819.
    4. Yao, S. & Peralta-Braz, P. & Alamdari, M.M. & Ruiz, R.O. & Atroshchenko, E., 2024. "Optimal design of piezoelectric energy harvesters for bridge infrastructure: Effects of location and traffic intensity on energy production," Applied Energy, Elsevier, vol. 355(C).
    5. Khalidou Abdoulaye Barry & Youness Manzali & Mohamed Lamrini & Flouchi Rachid & Mohamed Elfar, 2024. "Heart Disease Prediction Using Weighted K-Nearest Neighbor Algorithm," SN Operations Research Forum, Springer, vol. 5(3), pages 1-16, September.
    6. Angelo Leogrande & Carlo Drago & Massimo Arnone, 2024. "Analyzing Regional Disparities in E-Commerce Adoption Among Italian SMEs: Integrating Machine Learning Clustering and Predictive Models with Econometric Analysis," Working Papers hal-04700413, HAL.
    7. Jujie Wang & Zhenzhen Zhuang, 2023. "A novel cluster based multi-index nonlinear ensemble framework for carbon price forecasting," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 25(7), pages 6225-6247, July.
    8. Xinghua Wang & Xixian Liu & Fucheng Zhong & Zilv Li & Kaiguo Xuan & Zhuoli Zhao, 2023. "A Scenario Generation Method for Typical Operations of Power Systems with PV Integration Considering Weather Factors," Sustainability, MDPI, vol. 15(20), pages 1-20, October.
    9. Xie, Hailun & Eames, Matt & Mylona, Anastasia & Davies, Hywel & Challenor, Peter, 2024. "Creating granular climate zones for future-proof building design in the UK," Applied Energy, Elsevier, vol. 357(C).
    10. Cuomo, Maria Teresa & Tortora, Debora & Colosimo, Ivan & Ricciardi Celsi, Lorenzo & Genovino, Cinzia & Festa, Giuseppe & La Rocca, Michele, 2023. "Segmenting with big data analytics and Python: A quantitative exploratory analysis of household savings," Technological Forecasting and Social Change, Elsevier, vol. 191(C).
    11. Şebnem Koltan Yılmaz & Sibel Şener, 2022. "Analysis of The Countries According to The Prosperity Level with Data Mining," Alphanumeric Journal, Bahadir Fatih Yildirim, vol. 10(2), pages 85-104, December.
    12. Yen, Barbara T.H. & Li, Jun-Sheng, 2022. "Route-based performance evaluation for airlines – A metafrontier data envelopment analysis approach," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 162(C).
    13. Yunhwan Kim, 2022. "#Nomask on Instagram: Exploring Visual Representations of the Antisocial Norm on Social Media," IJERPH, MDPI, vol. 19(11), pages 1-14, June.
    14. Yunhwan Kim & Sunmi Lee, 2022. "#ShoutYourAbortion on Instagram: Exploring the Visual Representation of Hashtag Movement and the Public’s Responses," SAGE Open, , vol. 12(2), pages 21582440221, April.
    15. Massimo Arnone & Alberto Costantiello & Angelo Leogrande & Syed Kafait Hussain Naqvi & Cosimo Magazzino, 2024. "Financial Stability and Innovation: The Role of Non-Performing Loans," FinTech, MDPI, vol. 3(4), pages 1-41, October.
    16. Zhang, Ying & Robu, Valentin & Cremers, Sho & Norbu, Sonam & Couraud, Benoit & Andoni, Merlinda & Flynn, David & Poor, H. Vincent, 2024. "Modelling the formation of peer-to-peer trading coalitions and prosumer participation incentives in transactive energy communities," Applied Energy, Elsevier, vol. 355(C).
    17. Ghaemi, Zahra & Tran, Thomas T.D. & Smith, Amanda D., 2022. "Comparing classical and metaheuristic methods to optimize multi-objective operation planning of district energy systems considering uncertainties," Applied Energy, Elsevier, vol. 321(C).
    18. Chen, Hao, 2022. "Cluster-based ensemble learning for wind power modeling from meteorological wind data," Renewable and Sustainable Energy Reviews, Elsevier, vol. 167(C).
    19. Hazem Noori Abdulrazzak & Goh Chin Hock & Nurul Asyikin Mohamed Radzi & Nadia M. L. Tan & Chiew Foong Kwong, 2022. "Modeling and Analysis of New Hybrid Clustering Technique for Vehicular Ad Hoc Network," Mathematics, MDPI, vol. 10(24), pages 1-27, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-35295-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.