IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-04339462.html
   My bibliography  Save this paper

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data

Author

Listed:
  • Franck Jaotombo

    (EM - EMLyon Business School)

  • Luca Adorni
  • Badih Ghattas
  • Laurent Boyer

Abstract

Objective This study aims to develop high-performing Machine Learning and Deep Learning models in predicting hospital length of stay (LOS) while enhancing interpretability. We compare performance and interpretability of models trained only on structured tabular data with models trained only on unstructured clinical text data, and on mixed data. Methods The structured data was used to train fourteen classical Machine Learning models including advanced ensemble trees, neural networks and k-nearest neighbors. The unstructured data was used to fine-tune a pre-trained Bio Clinical BERT Transformer Deep Learning model. The structured and unstructured data were then merged into a tabular dataset after vectorization of the clinical text and a dimensional reduction through Latent Dirichlet Allocation. The study used the free and publicly available Medical Information Mart for Intensive Care (MIMIC) III database, on the open AutoML Library AutoGluon. Performance is evaluated with respect to two types of random classifiers, used as baselines. Results The best model from structured data demonstrates high performance (ROC AUC = 0.944, PRC AUC = 0.655) with limited interpretability, where the most important predictors of prolonged LOS are the level of blood urea nitrogen and of platelets. The Transformer model displays a good but lower performance (ROC AUC = 0.842, PRC AUC = 0.375) with a richer array of interpretability by providing more specific in-hospital factors including procedures, conditions, and medical history. The best model trained on mixed data satisfies both a high level of performance (ROC AUC = 0.963, PRC AUC = 0.746) and a much larger scope in interpretability including pathologies of the intestine, the colon, and the blood; infectious diseases, respiratory problems, procedures involving sedation and intubation, and vascular surgery. Conclusions Our results outperform most of the state-of-the-art models in LOS prediction both in terms of performance and of interpretability. Data fusion between structured and unstructured text data may significantly improve performance and interpretability.

Suggested Citation

  • Franck Jaotombo & Luca Adorni & Badih Ghattas & Laurent Boyer, 2023. "Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data," Post-Print hal-04339462, HAL.
  • Handle: RePEc:hal:journl:hal-04339462
    Note: View the original document on HAL open archive server: https://hal.science/hal-04339462
    as

    Download full text from publisher

    File URL: https://hal.science/hal-04339462/document
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kieran Stone & Reyer Zwiggelaar & Phil Jones & Neil Mac Parthaláin, 2022. "A systematic review of the prediction of hospital length of stay: Towards a unified framework," PLOS Digital Health, Public Library of Science, vol. 1(4), pages 1-38, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Keywords

      hospital length of stay; explainable AI; data fusion; structured and unstructured data; clinical transformers;
      All these keywords.

      NEP fields

      This paper has been announced in the following NEP Reports:

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-04339462. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.