IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2203.04065.html
   My bibliography  Save this paper

Honest calibration assessment for binary outcome predictions

Author

Listed:
  • Timo Dimitriadis
  • Lutz Duembgen
  • Alexander Henzi
  • Marius Puke
  • Johanna Ziegel

Abstract

Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$ for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the calibration curve $p$ and the local variance of the binary observations. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.

Suggested Citation

  • Timo Dimitriadis & Lutz Duembgen & Alexander Henzi & Marius Puke & Johanna Ziegel, 2022. "Honest calibration assessment for binary outcome predictions," Papers 2203.04065, arXiv.org, revised Nov 2022.
  • Handle: RePEc:arx:papers:2203.04065
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2203.04065
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Rejoinder to “Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test”," Biometrics, The International Biometric Society, vol. 76(2), pages 575-577, June.
    2. Koenker, Roger & Yoon, Jungmo, 2009. "Parametric links for binary choice models: A Fisherian-Bayesian colloquy," Journal of Econometrics, Elsevier, vol. 152(2), pages 120-130, October.
    3. Peter Hall & Joel L. Horowitz, 2013. "A simple bootstrap method for constructing nonparametric confidence bands for functions," CeMMAP working papers CWP29/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test," Biometrics, The International Biometric Society, vol. 76(2), pages 549-560, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mario V. Wuthrich & Johanna Ziegel, 2023. "Isotonic Recalibration under a Low Signal-to-Noise Ratio," Papers 2301.02692, arXiv.org.
    2. Henzi, Alexander & Dümbgen, Lutz, 2023. "Some new inequalities for beta distributions," Statistics & Probability Letters, Elsevier, vol. 195(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stamatiadou, Valentini & Mazaris, Antonios & Mallios, Zisis & Katsanevakis, Stelios, 2023. "Valuation and mapping of the recreational diving ecosystem service of the Aegean Sea," Ecosystem Services, Elsevier, vol. 64(C).
    2. Rui Liu & Feng Tan & Yaxuan Wang & Bo Ma & Ming Yuan & Lianxia Wang & Xin Zhao, 2022. "Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python Visual Analysis," Agriculture, MDPI, vol. 12(7), pages 1-14, July.
    3. Daniel Fernández & Louise McMillan & Richard Arnold & Martin Spiess & Ivy Liu, 2022. "Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model," Stats, MDPI, vol. 5(2), pages 1-14, June.
    4. Mayya Zhilova, 2015. "Simultaneous likelihood-based bootstrap confidence sets for a large number of models," SFB 649 Discussion Papers SFB649DP2015-031, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.
    5. Narisetty, Naveen & Koenker, Roger, 2022. "Censored quantile regression survival models with a cure proportion," Journal of Econometrics, Elsevier, vol. 226(1), pages 192-203.
    6. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    7. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    8. Kun Yi & Yoshihiko Nishiyama, 2022. "Smoothed bootstrapping kernel density estimation under higher order kernel," KIER Working Papers 1081, Kyoto University, Institute of Economic Research.
    9. Rui Hua & Wenhao Gui, 2022. "Revisit to progressively Type-II censored competing risks data from Lomax distributions," Journal of Risk and Reliability, , vol. 236(3), pages 377-394, June.
    10. Kobayashi, Yoshiharu & Heinrich, Tobias & Bryant, Kristin A., 2021. "Public support for development aid during the COVID-19 pandemic," World Development, Elsevier, vol. 138(C).
    11. Horowitz, Joel L. & Lee, Sokbae, 2017. "Nonparametric estimation and inference under shape restrictions," Journal of Econometrics, Elsevier, vol. 201(1), pages 108-126.
    12. Hess Wolfgang & Tutz Gerhard & Gertheiss Jan, 2016. "A Flexible Link Function for Discrete-Time Duration Models," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 455-481, August.
    13. Kato, Kengo & Sasaki, Yuya, 2018. "Uniform confidence bands in deconvolution with unknown error distribution," Journal of Econometrics, Elsevier, vol. 207(1), pages 129-161.
    14. Sokbae Lee & Ryo Okui & Yoon†Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
    15. M Ludkin & C Sherlock, 2023. "Hug and hop: a discrete-time, nonreversible Markov chain Monte Carlo algorithm," Biometrika, Biometrika Trust, vol. 110(2), pages 301-318.
    16. Rainer Winkelmann, 2012. "Copula Bivariate Probit Models: With An Application To Medical Expenditures," Health Economics, John Wiley & Sons, Ltd., vol. 21(12), pages 1444-1455, December.
    17. Vijverberg, Chu-Ping C. & Vijverberg, Wim P., 2012. "Pregibit: A Family of Discrete Choice Models," IZA Discussion Papers 6359, Institute of Labor Economics (IZA).
    18. Kengo Kato & Yuya Sasaki & Takuya Ura, 2018. "Inference based on Kotlarski's Identity," Papers 1808.09375, arXiv.org, revised Sep 2019.
    19. Tommaso Proietti & Alessandra Luati, 2013. "The Exponential Model for the Spectrum of a Time Series: Extensions and Applications," CEIS Research Paper 272, Tor Vergata University, CEIS, revised 19 Apr 2013.
    20. Gery Geenens & Thomas Cuddihy, 2018. "Non‐parametric evidence of second‐leg home advantage in European football," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1009-1031, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2203.04065. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.