IDEAS home Printed from https://ideas.repec.org/a/plo/pmed00/1002686.html
   My bibliography  Save this article

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

Author

Listed:
  • Pranav Rajpurkar
  • Jeremy Irvin
  • Robyn L Ball
  • Kaylie Zhu
  • Brandon Yang
  • Hershel Mehta
  • Tony Duan
  • Daisy Ding
  • Aarti Bagul
  • Curtis P Langlotz
  • Bhavik N Patel
  • Kristen W Yeom
  • Katie Shpanskaya
  • Francis G Blankenberg
  • Jayne Seekins
  • Timothy J Amrhein
  • David A Mong
  • Safwan S Halabi
  • Evan J Zucker
  • Andrew Y Ng
  • Matthew P Lungren

Abstract

Background: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. Methods and findings: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt’s discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4–28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863–0.910), 0.911 (95% CI 0.866–0.947), and 0.985 (95% CI 0.974–0.991), respectively, whereas CheXNeXt’s AUCs were 0.831 (95% CI 0.790–0.870), 0.704 (95% CI 0.567–0.833), and 0.851 (95% CI 0.785–0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825–0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777–0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. Conclusions: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics. In their study, Pranav Rajpurkar and colleagues test a deep learning algorithm that classifies clinically important abnormalities in chest radiographs.Why was this study done?: What did the researchers do and find?: What do these findings mean?:

Suggested Citation

  • Pranav Rajpurkar & Jeremy Irvin & Robyn L Ball & Kaylie Zhu & Brandon Yang & Hershel Mehta & Tony Duan & Daisy Ding & Aarti Bagul & Curtis P Langlotz & Bhavik N Patel & Kristen W Yeom & Katie Shpanska, 2018. "Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists," PLOS Medicine, Public Library of Science, vol. 15(11), pages 1-17, November.
  • Handle: RePEc:plo:pmed00:1002686
    DOI: 10.1371/journal.pmed.1002686
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002686
    Download Restriction: no

    File URL: https://journals.plos.org/plosmedicine/article/file?id=10.1371/journal.pmed.1002686&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pmed.1002686?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oded Rotem & Tamar Schwartz & Ron Maor & Yishay Tauber & Maya Tsarfati Shapiro & Marcos Meseguer & Daniella Gilboa & Daniel S. Seidman & Assaf Zaritsky, 2024. "Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Seung Seog Han & Ik Jun Moon & Seong Hwan Kim & Jung-Im Na & Myoung Shin Kim & Gyeong Hun Park & Ilwoo Park & Keewon Kim & Woohyung Lim & Ju Hee Lee & Sung Eun Chang, 2020. "Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study," PLOS Medicine, Public Library of Science, vol. 17(11), pages 1-21, November.
    3. Weijie Fan & Yi Yang & Jing Qi & Qichuan Zhang & Cuiwei Liao & Li Wen & Shuang Wang & Guangxian Wang & Yu Xia & Qihua Wu & Xiaotao Fan & Xingcai Chen & Mi He & JingJing Xiao & Liu Yang & Yun Liu & Jia, 2024. "A deep-learning-based framework for identifying and localizing multiple abnormalities and assessing cardiomegaly in chest X-ray," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Shashank Shetty & Ananthanarayana V S. & Ajit Mahale, 2022. "MS-CheXNet: An Explainable and Lightweight Multi-Scale Dilated Network with Depthwise Separable Convolution for Prediction of Pulmonary Abnormalities in Chest Radiographs," Mathematics, MDPI, vol. 10(19), pages 1-29, October.
    5. Eric Engle & Andrei Gabrielian & Alyssa Long & Darrell E Hurt & Alex Rosenthal, 2020. "Performance of Qure.ai automatic classifiers against a large annotated database of patients with diverse forms of tuberculosis," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-19, January.
    6. Mingzhu Liu & Chirag Nagpal & Artur Dubrawski, 2024. "Deep Survival Models Can Improve Long-Term Mortality Risk Estimates from Chest Radiographs," Forecasting, MDPI, vol. 6(2), pages 1-14, May.
    7. Eun Young Kim & Young Jae Kim & Won-Jun Choi & Gi Pyo Lee & Ye Ra Choi & Kwang Nam Jin & Young Jun Cho, 2021. "Performance of a deep-learning algorithm for referable thoracic abnormalities on chest radiographs: A multicenter study of a health screening cohort," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-12, February.
    8. Tianyu Han & Sven Nebelung & Federico Pedersoli & Markus Zimmermann & Maximilian Schulze-Hagen & Michael Ho & Christoph Haarburger & Fabian Kiessling & Christiane Kuhl & Volkmar Schulz & Daniel Truhn, 2021. "Advancing diagnostic performance and clinical usability of neural networks via adversarial training and dual batch normalization," Nature Communications, Nature, vol. 12(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pmed00:1002686. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosmedicine (email available below). General contact details of provider: https://journals.plos.org/plosmedicine/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.