IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0238908.html
   My bibliography  Save this article

Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging

Author

Listed:
  • Dong Wook Kim
  • Hye Young Jang
  • Yousun Ko
  • Jung Hee Son
  • Pyeong Hwa Kim
  • Seon-Ok Kim
  • Joon Seo Lim
  • Seong Ho Park

Abstract

Background: The development of deep learning (DL) algorithms is a three-step process—training, tuning, and testing. Studies are inconsistent in the use of the term “validation”, with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term “validation” in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. Methods and findings: We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term “validation” was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF 10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. Conclusions: Existing literature has a significant degree of inconsistency in using the term “validation” when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.

Suggested Citation

  • Dong Wook Kim & Hye Young Jang & Yousun Ko & Jung Hee Son & Pyeong Hwa Kim & Seon-Ok Kim & Joon Seo Lim & Seong Ho Park, 2020. "Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-10, September.
  • Handle: RePEc:plo:pone00:0238908
    DOI: 10.1371/journal.pone.0238908
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0238908
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0238908&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0238908?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. article Editorial, 2020. "Reviewers," Russian Journal of Industrial Economics, MISIS, vol. 12(4).
    2. Linda Nevin & on behalf of the PLOS Medicine Editors, 2018. "Advancing the beneficial use of machine learning in health care and medicine: Toward a community understanding," PLOS Medicine, Public Library of Science, vol. 15(11), pages 1-4, November.
    3. Miriam Harris & Amy Qi & Luke Jeagal & Nazi Torabi & Dick Menzies & Alexei Korobitsyn & Madhukar Pai & Ruvandhi R Nathavitharana & Faiz Ahmad Khan, 2019. "A systematic review of the diagnostic accuracy of artificial intelligence-based computer programs to analyze chest x-rays for pulmonary tuberculosis," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-19, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitris Bertsimas & Agni Orfanoudaki & Rory B. Weiner, 2020. "Personalized treatment for coronary artery disease patients: a machine learning approach," Health Care Management Science, Springer, vol. 23(4), pages 482-506, December.
    2. Benedict E. DeDominicis, 2021. "The Common Agricultural Policy Of The European Union And Bulgaria: Critiquing The New York Times 2019 Expos㉠Of Corruption In The Common Agricultural Policy," International Journal of Management and Marketing Research, The Institute for Business and Finance Research, vol. 14(1), pages 35-61.
    3. Sangjoon Park & Gwanghyun Kim & Yujin Oh & Joon Beom Seo & Sang Min Lee & Jin Hwan Kim & Sungjun Moon & Jae-Kwang Lim & Chang Min Park & Jong Chul Ye, 2022. "Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Ahmad Yaman Abdin & Muhammad Jawad Nasim & Yannick Ney & Claus Jacob, 2021. "The Pioneering Role of Sci in Post Publication Public Peer Review (P4R)," Publications, MDPI, vol. 9(1), pages 1-12, March.
    5. Rodney Ehrlich & Stephen Barker & Jim te Water Naude & David Rees & Barry Kistnasamy & Julian Naidoo & Annalee Yassi, 2022. "Accuracy of Computer-Aided Detection of Occupational Lung Disease: Silicosis and Pulmonary Tuberculosis in Ex-Miners from the South African Gold Mines," IJERPH, MDPI, vol. 19(19), pages 1-14, September.
    6. Benedict E. DeDominicis, 2021. "American Economic Nationalism: Corporatist, Neoliberal And Neocorporatist Political Strategic Responses To Contemporary Global Systemic Crises," Review of Business and Finance Studies, The Institute for Business and Finance Research, vol. 12(1), pages 1-30.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0238908. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.