IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v619y2023i7969d10.1038_s41586-023-06160-y.html
   My bibliography  Save this article

Health system-scale language models are all-purpose prediction engines

Author

Listed:
  • Lavender Yao Jiang

    (NYU Langone Health
    New York University)

  • Xujin Chris Liu

    (NYU Langone Health
    Tandon School of Engineering)

  • Nima Pour Nejatian

    (NVIDIA)

  • Mustafa Nasir-Moin

    (NYU Langone Health)

  • Duo Wang

    (NYU Langone Health)

  • Anas Abidin

    (NVIDIA)

  • Kevin Eaton

    (NYU Langone Health)

  • Howard Antony Riina

    (NYU Langone Health)

  • Ilya Laufer

    (NYU Langone Health)

  • Paawan Punjabi

    (NYU Langone Health)

  • Madeline Miceli

    (NYU Langone Health)

  • Nora C. Kim

    (NYU Langone Health)

  • Cordelia Orillac

    (NYU Langone Health)

  • Zane Schnurman

    (NYU Langone Health)

  • Christopher Livia

    (NYU Langone Health)

  • Hannah Weiss

    (NYU Langone Health)

  • David Kurland

    (NYU Langone Health)

  • Sean Neifert

    (NYU Langone Health)

  • Yosef Dastagirzada

    (NYU Langone Health)

  • Douglas Kondziolka

    (NYU Langone Health)

  • Alexander T. M. Cheung

    (NYU Langone Health)

  • Grace Yang

    (NYU Langone Health
    New York University)

  • Ming Cao

    (NYU Langone Health
    New York University)

  • Mona Flores

    (NVIDIA)

  • Anthony B. Costa

    (NVIDIA)

  • Yindalon Aphinyanaphongs

    (NYU Langone Health
    NYU Langone Health)

  • Kyunghyun Cho

    (New York University
    Prescient Design, Genentech
    New York University
    Canadian Institute for Advanced Research)

  • Eric Karl Oermann

    (NYU Langone Health
    New York University
    NYU Langone Health)

Abstract

Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1–3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7–94.9%, with an improvement of 5.36–14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.

Suggested Citation

  • Lavender Yao Jiang & Xujin Chris Liu & Nima Pour Nejatian & Mustafa Nasir-Moin & Duo Wang & Anas Abidin & Kevin Eaton & Howard Antony Riina & Ilya Laufer & Paawan Punjabi & Madeline Miceli & Nora C. K, 2023. "Health system-scale language models are all-purpose prediction engines," Nature, Nature, vol. 619(7969), pages 357-362, July.
  • Handle: RePEc:nat:nature:v:619:y:2023:i:7969:d:10.1038_s41586-023-06160-y
    DOI: 10.1038/s41586-023-06160-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-023-06160-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-023-06160-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chen Gao & Xiaochong Lan & Nian Li & Yuan Yuan & Jingtao Ding & Zhilun Zhou & Fengli Xu & Yong Li, 2024. "Large language models empowered agent-based modeling and simulation: a survey and perspectives," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-24, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:619:y:2023:i:7969:d:10.1038_s41586-023-06160-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.