IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0223832.html
   My bibliography  Save this article

Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard – An update

Author

Listed:
  • Chinyereugo M Umemneku Chikere
  • Kevin Wilson
  • Sara Graziadio
  • Luke Vale
  • A Joy Allen

Abstract

Objective: To systematically review methods developed and employed to evaluate the diagnostic accuracy of medical test when there is a missing or no gold standard. Study design and settings: Articles that proposed or applied any methods to evaluate the diagnostic accuracy of medical test(s) in the absence of gold standard were reviewed. The protocol for this review was registered in PROSPERO (CRD42018089349). Results: Identified methods were classified into four main groups: methods employed when there is a missing gold standard; correction methods (which make adjustment for an imperfect reference standard with known diagnostic accuracy measures); methods employed to evaluate a medical test using multiple imperfect reference standards; and other methods, like agreement studies, and a mixed group of alternative study designs. Fifty-one statistical methods were identified from the review that were developed to evaluate medical test(s) when the true disease status of some participants is unverified with the gold standard. Seven correction methods were identified and four methods were identified to evaluate medical test(s) using multiple imperfect reference standards. Flow-diagrams were developed to guide the selection of appropriate methods. Conclusion: Various methods have been proposed to evaluate medical test(s) in the absence of a gold standard for some or all participants in a diagnostic accuracy study. These methods depend on the availability of the gold standard, its’ application to the participants in the study and the availability of alternative reference standard(s). The clinical application of some of these methods, especially methods developed when there is missing gold standard is however limited. This may be due to the complexity of these methods and/or a disconnection between the fields of expertise of those who develop (e.g. mathematicians) and those who employ the methods (e.g. clinical researchers). This review aims to help close this gap with our classification and guidance tools.

Suggested Citation

  • Chinyereugo M Umemneku Chikere & Kevin Wilson & Sara Graziadio & Luke Vale & A Joy Allen, 2019. "Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard – An update," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-25, October.
  • Handle: RePEc:plo:pone00:0223832
    DOI: 10.1371/journal.pone.0223832
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0223832
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0223832&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0223832?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. R. Mark Henkelman & Ian Kay & Michael J. Bronskill, 1990. "Receiver Operator characteristic (ROC) Analysis without Truth," Medical Decision Making, , vol. 10(1), pages 24-29, February.
    2. Rotnitzky, Andrea & Faraggi, David & Schisterman, Enrique, 2006. "Doubly Robust Estimation of the Area Under the Receiver-Operating Characteristic Curve in the Presence of Verification Bias," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1276-1288, September.
    3. Rafdzah Zaki & Awang Bulgiba & Roshidi Ismail & Noor Azina Ismail, 2012. "Statistical Methods Used to Test for Agreement of Medical Instruments Measuring Continuous Variables in Method Comparison Studies: A Systematic Review," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-7, May.
    4. Leandro García Barrado & Els Coart & Tomasz Burzykowski, 2017. "Estimation of diagnostic accuracy of a combination of continuous biomarkers allowing for conditional dependence between the biomarkers and the imperfect reference-test," Biometrics, The International Biometric Society, vol. 73(2), pages 646-655, June.
    5. J. Roldán Nofuentes & J. Luna del Castillo & P. Femia Marzo, 2009. "Computational methods for comparing two binary diagnostic tests in the presence of partial verification of the disease," Computational Statistics, Springer, vol. 24(4), pages 695-718, December.
    6. Scott Weichenthal & Lawrence Joseph & Patrick Bélisle & André Dufresne, 2010. "Bayesian Estimation of the Probability of Asbestos Exposure from Lung Fiber Counts," Biometrics, The International Biometric Society, vol. 66(2), pages 603-612, June.
    7. Brandi N. Falley & James D. Stamey & A. Alexander Beaujean, 2018. "Bayesian estimation of logistic regression with misclassified covariates and response," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(10), pages 1756-1769, July.
    8. Bo Zhang & Zhen Chen & Paul S. Albert, 2012. "Estimating Diagnostic Accuracy of Raters Without a Gold Standard by Exploiting a Group of Experts," Biometrics, The International Biometric Society, vol. 68(4), pages 1294-1302, December.
    9. Cherry Lim & Prapass Wannapinij & Lisa White & Nicholas P J Day & Ben S Cooper & Sharon J Peacock & Direk Limmathurotsakul, 2013. "Using a Web-Based Application to Define the Accuracy of Diagnostic Tests When the Gold Standard Is Imperfect," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-1, November.
    10. Page, John H. & Rotnitzky, Andrea, 2009. "Estimation of the disease-specific diagnostic marker distribution under verification bias," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 707-717, January.
    11. Robert Gray & Colin B. Begg & Robert A. Greenes, 1984. "Construction of Receiver Operating Characteristic Curves when Disease Verification Is Subject to Selection Bias," Medical Decision Making, , vol. 4(2), pages 151-164, June.
    12. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    13. Martinez, Edson Zangiacomi & Alberto Achcar, Jorge & Louzada-Neto, Francisco, 2006. "Estimators of sensitivity and specificity in the presence of verification bias: A Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 601-611, November.
    14. Loes C M Bertens & Berna D L Broekhuizen & Christiana A Naaktgeboren & Frans H Rutten & Arno W Hoes & Yvonne van Mourik & Karel G M Moons & Johannes B Reitsma, 2013. "Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting," PLOS Medicine, Public Library of Science, vol. 10(10), pages 1-17, October.
    15. repec:bla:jorssc:v:57:y:2008:i:1:p:89-102 is not listed on IDEAS
    16. Nandini Dendukuri & Elham Rahme & Patrick Bélisle & Lawrence Joseph, 2004. "Bayesian Sample Size Determination for Prevalence and Diagnostic Test Studies in the Absence of a Gold Standard Test," Biometrics, The International Biometric Society, vol. 60(2), pages 388-397, June.
    17. repec:bla:jorssc:v:57:y:2008:i:1:p:1-23 is not listed on IDEAS
    18. Paul S. Albert, 2007. "Imputation Approaches for Estimating Diagnostic Accuracy for Multiple Tests from Partially Verified Designs," Biometrics, The International Biometric Society, vol. 63(3), pages 947-957, September.
    19. Maria G.M. Hunink & Douglas K. Richardson & Peter M. Doubilet & Colin B. Begg, 1990. "Testing for Fetal Pulmonary Maturity," Medical Decision Making, , vol. 10(3), pages 201-211, August.
    20. Danping Liu & Xiao-Hua Zhou, 2011. "Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias," Biometrics, The International Biometric Society, vol. 67(3), pages 906-916, September.
    21. Cindy Rodenberg & Xiao-Hua Zhou, 2000. "ROC Curve Estimation When Covariates Affect the Verification Process," Biometrics, The International Biometric Society, vol. 56(4), pages 1256-1262, December.
    22. Zhu, Rui & Ghosal, Subhashis, 2019. "Bayesian Semiparametric ROC surface estimation under verification bias," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 40-52.
    23. Bohning, Dankmar & Patilea, Valentin, 2008. "A CaptureRecapture Approach for Screening Using Two Diagnostic Tests With Availability of Disease Status for the Test Positives Only," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 212-221, March.
    24. Albert, Paul S. & Dodd, Lori E., 2008. "On Estimating Diagnostic Accuracy From Studies With Multiple Raters and Partial Gold Standard Evaluation," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 61-73, March.
    25. Paul S. Albert & Lori E. Dodd, 2004. "A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard," Biometrics, The International Biometric Society, vol. 60(2), pages 427-435, June.
    26. Paul S. Albert & Lisa M. McShane & Joanna H. Shih, 2001. "Latent Class Modeling Approaches for Assessing Diagnostic Error without a Gold Standard: With Applications to p53 Immunohistochemical Assays in Bladder Tumors," Biometrics, The International Biometric Society, vol. 57(2), pages 610-619, June.
    27. Khanh To Duc & Monica Chiogna & Gianfranco Adimari, 2019. "Estimation of the volume under the ROC surface in presence of nonignorable verification bias," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(4), pages 695-722, December.
    28. Beom Seuk Hwang & Zhen Chen, 2015. "An Integrated Bayesian Nonparametric Approach for Stochastic and Variability Orders in ROC Curve Estimation: An Application to Endometriosis Diagnosis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 923-934, September.
    29. Liu, Wei & Zhang, Bo & Zhang, Zhiwei & Chen, Baojiang & Zhou, Xiao-Hua, 2015. "A pseudo-likelihood approach for estimating diagnostic accuracy of multiple binary medical tests," Computational Statistics & Data Analysis, Elsevier, vol. 84(C), pages 85-98.
    30. Todd A. Alonzo & Margaret Sullivan Pepe, 2005. "Assessing accuracy of a continuous screening test in the presence of verification bias," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(1), pages 173-190, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Clara Drew & Moses Badio & Dehkontee Dennis & Lisa Hensley & Elizabeth Higgs & Michael Sneller & Mosoka Fallah & Cavan Reilly, 2023. "Simplifying the estimation of diagnostic testing accuracy over time for high specificity tests in the absence of a gold standard," Biometrics, The International Biometric Society, vol. 79(2), pages 1546-1558, June.
    2. Dani Kiyasseh & Aaron Cohen & Chengsheng Jiang & Nicholas Altieri, 2024. "A framework for evaluating clinical artificial intelligence systems without ground-truth annotations," Nature Communications, Nature, vol. 15(1), pages 1-14, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Danping Liu & Xiao-Hua Zhou, 2013. "Covariate Adjustment in Estimating the Area Under ROC Curve with Partially Missing Gold Standard," Biometrics, The International Biometric Society, vol. 69(1), pages 91-100, March.
    2. Liu, Wei & Zhang, Bo & Zhang, Zhiwei & Chen, Baojiang & Zhou, Xiao-Hua, 2015. "A pseudo-likelihood approach for estimating diagnostic accuracy of multiple binary medical tests," Computational Statistics & Data Analysis, Elsevier, vol. 84(C), pages 85-98.
    3. Danping Liu & Xiao-Hua Zhou, 2011. "Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias," Biometrics, The International Biometric Society, vol. 67(3), pages 906-916, September.
    4. Page, John H. & Rotnitzky, Andrea, 2009. "Estimation of the disease-specific diagnostic marker distribution under verification bias," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 707-717, January.
    5. Danping Liu & Xiao-Hua Zhou, 2010. "A Model for Adjusting for Nonignorable Verification Bias in Estimation of the ROC Curve and Its Area with Likelihood-Based Approach," Biometrics, The International Biometric Society, vol. 66(4), pages 1119-1128, December.
    6. Paul S. Albert, 2007. "Random Effects Modeling Approaches for Estimating ROC Curves from Repeated Ordinal Tests without a Gold Standard," Biometrics, The International Biometric Society, vol. 63(2), pages 593-602, June.
    7. Shanshan Li & Yang Ning, 2015. "Estimation of covariate‐specific time‐dependent ROC curves in the presence of missing biomarkers," Biometrics, The International Biometric Society, vol. 71(3), pages 666-676, September.
    8. Clara Drew & Moses Badio & Dehkontee Dennis & Lisa Hensley & Elizabeth Higgs & Michael Sneller & Mosoka Fallah & Cavan Reilly, 2023. "Simplifying the estimation of diagnostic testing accuracy over time for high specificity tests in the absence of a gold standard," Biometrics, The International Biometric Society, vol. 79(2), pages 1546-1558, June.
    9. Geoffrey Jones & Wesley O. Johnson & Timothy E. Hanson & Ronald Christensen, 2010. "Identifiability of Models for Multiple Diagnostic Testing in the Absence of a Gold Standard," Biometrics, The International Biometric Society, vol. 66(3), pages 855-863, September.
    10. Paul S. Albert, 2007. "Imputation Approaches for Estimating Diagnostic Accuracy for Multiple Tests from Partially Verified Designs," Biometrics, The International Biometric Society, vol. 63(3), pages 947-957, September.
    11. Bruce D. Spencer, 2012. "When Do Latent Class Models Overstate Accuracy for Diagnostic and Other Classifiers in the Absence of a Gold Standard?," Biometrics, The International Biometric Society, vol. 68(2), pages 559-566, June.
    12. Wang, Zheyu & Sebestyen, Krisztian & Monsell, Sarah E., 2017. "Model-based clustering for assessing the prognostic value of imaging biomarkers and mixed type tests," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 125-135.
    13. Bo Zhang & Zhen Chen & Paul S. Albert, 2012. "Estimating Diagnostic Accuracy of Raters Without a Gold Standard by Exploiting a Group of Experts," Biometrics, The International Biometric Society, vol. 68(4), pages 1294-1302, December.
    14. Elizabeth R. Brown, 2010. "Bayesian Estimation of the Time-Varying Sensitivity of a Diagnostic Test with Application to Mother-to-Child Transmission of HIV," Biometrics, The International Biometric Society, vol. 66(4), pages 1266-1274, December.
    15. Pankaj Patel & Sherry Thatcher & Katerina Bezrukova, 2013. "Organizationally-relevant configurations: the value of modeling local dependence," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(1), pages 287-311, January.
    16. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    17. Mumtaz, Haroon & Theodoridis, Konstantinos, 2017. "Common and country specific economic uncertainty," Journal of International Economics, Elsevier, vol. 105(C), pages 205-216.
    18. Jesse Elliott & Zemin Bai & Shu-Ching Hsieh & Shannon E Kelly & Li Chen & Becky Skidmore & Said Yousef & Carine Zheng & David J Stewart & George A Wells, 2020. "ALK inhibitors for non-small cell lung cancer: A systematic review and network meta-analysis," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-18, February.
    19. Christina Leuker & Thorsten Pachur & Ralph Hertwig & Timothy J. Pleskac, 2019. "Do people exploit risk–reward structures to simplify information processing in risky choice?," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 5(1), pages 76-94, August.
    20. Francois Olivier & Laval Guillaume, 2011. "Deviance Information Criteria for Model Selection in Approximate Bayesian Computation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-25, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0223832. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.