IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2203.04065.html
   My bibliography  Save this paper

Honest calibration assessment for binary outcome predictions

Author

Listed:
  • Timo Dimitriadis
  • Lutz Duembgen
  • Alexander Henzi
  • Marius Puke
  • Johanna Ziegel

Abstract

Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$ for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the calibration curve $p$ and the local variance of the binary observations. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.

Suggested Citation

  • Timo Dimitriadis & Lutz Duembgen & Alexander Henzi & Marius Puke & Johanna Ziegel, 2022. "Honest calibration assessment for binary outcome predictions," Papers 2203.04065, arXiv.org, revised Nov 2022.
  • Handle: RePEc:arx:papers:2203.04065
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2203.04065
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Rejoinder to “Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test”," Biometrics, The International Biometric Society, vol. 76(2), pages 575-577, June.
    2. Koenker, Roger & Yoon, Jungmo, 2009. "Parametric links for binary choice models: A Fisherian-Bayesian colloquy," Journal of Econometrics, Elsevier, vol. 152(2), pages 120-130, October.
    3. Peter Hall & Joel L. Horowitz, 2013. "A simple bootstrap method for constructing nonparametric confidence bands for functions," CeMMAP working papers CWP29/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test," Biometrics, The International Biometric Society, vol. 76(2), pages 549-560, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mario V. Wuthrich & Johanna Ziegel, 2023. "Isotonic Recalibration under a Low Signal-to-Noise Ratio," Papers 2301.02692, arXiv.org.
    2. Henzi, Alexander & Dümbgen, Lutz, 2023. "Some new inequalities for beta distributions," Statistics & Probability Letters, Elsevier, vol. 195(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rui Liu & Feng Tan & Yaxuan Wang & Bo Ma & Ming Yuan & Lianxia Wang & Xin Zhao, 2022. "Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python Visual Analysis," Agriculture, MDPI, vol. 12(7), pages 1-14, July.
    2. M. Kelemen & J. Danesh & E. Angelantonio & M. Inouye & J. O’Sullivan & L. Pennells & T. Roychowdhury & M. J. Sweeting & A. M. Wood & S. Harrison & L. G. Kim, 2024. "Evaluating the cost-effectiveness of polygenic risk score-stratified screening for abdominal aortic aneurysm," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    3. Daniel Fernández & Louise McMillan & Richard Arnold & Martin Spiess & Ivy Liu, 2022. "Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model," Stats, MDPI, vol. 5(2), pages 1-14, June.
    4. Stamatiadou, Valentini & Mazaris, Antonios & Mallios, Zisis & Katsanevakis, Stelios, 2023. "Valuation and mapping of the recreational diving ecosystem service of the Aegean Sea," Ecosystem Services, Elsevier, vol. 64(C).
    5. Dean Fantazzini, 2022. "Crypto-Coins and Credit Risk: Modelling and Forecasting Their Probability of Death," JRFM, MDPI, vol. 15(7), pages 1-34, July.
    6. Jean-Pierre FLORENS & Joel L. HOROWITZ & Ingrid VAN KEILEGOM, 2017. "Bias-Corrected Confidence Intervals in a Class of Linear Inverse Problems," Annals of Economics and Statistics, GENES, issue 128, pages 203-228.
    7. repec:hum:wpaper:sfb649dp2015-031 is not listed on IDEAS
    8. Chu-Ping C. Vijverberg & Wim P. M. Vijverberg, 2016. "Pregibit: a family of binary choice models," Empirical Economics, Springer, vol. 50(3), pages 901-932, May.
    9. Narisetty, Naveen & Koenker, Roger, 2022. "Censored quantile regression survival models with a cure proportion," Journal of Econometrics, Elsevier, vol. 226(1), pages 192-203.
    10. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    11. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    12. Jean Peyhardi, 2020. "Robustness of Student link function in multinomial choice models," Post-Print hal-03227808, HAL.
    13. Wang-Sheng Lee, 2013. "Propensity score matching and variations on the balancing test," Empirical Economics, Springer, vol. 44(1), pages 47-80, February.
    14. Marco Bee, 2024. "On discriminating between lognormal and Pareto tail: an unsupervised mixture-based approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 251-269, June.
    15. Kun Yi & Yoshihiko Nishiyama, 2022. "Smoothed bootstrapping kernel density estimation under higher order kernel," KIER Working Papers 1081, Kyoto University, Institute of Economic Research.
    16. Rui Hua & Wenhao Gui, 2022. "Revisit to progressively Type-II censored competing risks data from Lomax distributions," Journal of Risk and Reliability, , vol. 236(3), pages 377-394, June.
    17. Kobayashi, Yoshiharu & Heinrich, Tobias & Bryant, Kristin A., 2021. "Public support for development aid during the COVID-19 pandemic," World Development, Elsevier, vol. 138(C).
    18. Horowitz, Joel L. & Lee, Sokbae, 2017. "Nonparametric estimation and inference under shape restrictions," Journal of Econometrics, Elsevier, vol. 201(1), pages 108-126.
    19. Hess Wolfgang & Tutz Gerhard & Gertheiss Jan, 2016. "A Flexible Link Function for Discrete-Time Duration Models," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 455-481, August.
    20. Brathwaite, Timothy & Walker, Joan L., 2018. "Asymmetric, closed-form, finite-parameter models of multinomial choice," Journal of choice modelling, Elsevier, vol. 29(C), pages 78-112.
    21. Joel L. Horowitz, 2018. "Bootstrap Methods in Econometrics," Papers 1809.04016, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2203.04065. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.