IDEAS home Printed from https://ideas.repec.org/a/sae/medema/v45y2025i3p232-244.html
   My bibliography  Save this article

Expected Value of Sample Information Calculations for Risk Prediction Model Validation

Author

Listed:
  • Mohsen Sadatsafavi

    (Respiratory Evaluation Sciences Program, Collaboration for Outcomes Research and Evaluation, Faculty of Pharmaceutical Sciences, The University of British Columbia, Vancouver, BC, Canada)

  • Andrew J. Vickers

    (Department of Epidemiology and Biostatistics, Memorial Sloan, Kettering Cancer Center, New York, NY, USA)

  • Tae Yoon Lee

    (Respiratory Evaluation Sciences Program, Collaboration for Outcomes Research and Evaluation, Faculty of Pharmaceutical Sciences, The University of British Columbia, Vancouver, BC, Canada)

  • Paul Gustafson

    (Department of Statistics, The University of British Columbia, Vancouver, BC, Canada)

  • Laure Wynants

    (Department of Epidemiology, CAPHRI Care and Public Health, Research Institute, Maastricht University, Maastricht, The Netherlands
    Department of Development and Regeneration, KU Leuven, Leuven, Belgium)

Abstract

Background The purpose of external validation of a risk prediction model is to evaluate its performance before recommending it for use in a new population. Sample size calculations for such validation studies are currently based on classical inferential statistics around metrics of discrimination, calibration, and net benefit (NB). For NB as a measure of clinical utility, the relevance of inferential statistics is doubtful. Value-of-information methodology enables quantifying the value of collecting validation data in terms of expected gain in clinical utility. Methods We define the validation expected value of sample information (EVSI) as the expected gain in NB by procuring a validation sample of a given size. We propose 3 algorithms for EVSI computation and compare their face validity and computation time in simulation studies. In a case study, we use the non-US subset of a clinical trial to create a risk prediction model for short-term mortality after myocardial infarction and calculate validation EVSI at a range of sample sizes for the US population. Results Computation methods generated similar EVSI values in simulation studies, although they differed in numerical accuracy and computation times. At 2% risk threshold, procuring 1,000 observations for external validation, had an EVSI of 0.00101 in true-positive units or 0.04938 in false-positive units. Scaled by heart attack incidence in the United States, the population EVSI was 806 in true positives gained, or 39,500 in false positives averted, annually. Validation studies with >4,000 observations had diminishing returns, as the EVSIs were approaching their maximum possible value. Conclusion Value-of-information methodology quantifies the return on investment from conducting an external validation study and can provide a value-based perspective when designing such studies. Highlights In external validation studies of risk prediction models, the finite size of the validation sample leads to uncertain conclusions about the performance of the model. This uncertainty has hitherto been approached from a classical inferential perspective (e.g., confidence interval around the c-statistic). Correspondingly, sample size calculations for validation studies have been based on classical inferential statistics. For measures of clinical utility such as net benefit, the relevance of this approach is doubtful. This article defines the expected value of sample information (EVSI) for model validation and suggests algorithms for its computation. Validation EVSI quantifies the return on investment from conducting a validation study. Value-based approaches rooted in decision theory can complement contemporary study design and sample size calculation methods in predictive analytics.

Suggested Citation

  • Mohsen Sadatsafavi & Andrew J. Vickers & Tae Yoon Lee & Paul Gustafson & Laure Wynants, 2025. "Expected Value of Sample Information Calculations for Risk Prediction Model Validation," Medical Decision Making, , vol. 45(3), pages 232-244, April.
  • Handle: RePEc:sae:medema:v:45:y:2025:i:3:p:232-244
    DOI: 10.1177/0272989X251314010
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0272989X251314010
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0272989X251314010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:medema:v:45:y:2025:i:3:p:232-244. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.