Author
Listed:
- Jenna Kefeli
(Columbia University)
- Jacob Berkowitz
(Cedars-Sinai Medical Center
Cedars-Sinai Medical Center)
- Jose M. Acitores Cortina
(Cedars-Sinai Medical Center
Cedars-Sinai Medical Center)
- Kevin K. Tsang
(Cedars-Sinai Medical Center
Cedars-Sinai Medical Center)
- Nicholas P. Tatonetti
(Columbia University
Cedars-Sinai Medical Center
Cedars-Sinai Medical Center
Columbia University)
Abstract
Cancer staging is an essential clinical attribute informing patient prognosis and clinical trial eligibility. However, it is not routinely recorded in structured electronic health records. Here, we present BB-TEN: Big Bird – TNM staging Extracted from Notes, a generalizable method for the automated classification of TNM stage directly from pathology report text. We train a BERT-based model using publicly available pathology reports across approximately 7000 patients and 23 cancer types. We explore the use of different model types, with differing input sizes, parameters, and model architectures. Our final model goes beyond term-extraction, inferring TNM stage from context when it is not included in the report text explicitly. As external validation, we test our model on almost 8000 pathology reports from Columbia University Medical Center, finding that our trained model achieved an AU-ROC of 0.815–0.942. This suggests that our model can be applied broadly to other institutions without additional institution-specific fine-tuning.
Suggested Citation
Jenna Kefeli & Jacob Berkowitz & Jose M. Acitores Cortina & Kevin K. Tsang & Nicholas P. Tatonetti, 2024.
"Generalizable and automated classification of TNM stage from pathology reports with external validation,"
Nature Communications, Nature, vol. 15(1), pages 1-7, December.
Handle:
RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53190-9
DOI: 10.1038/s41467-024-53190-9
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53190-9. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.