Honest calibration assessment for binary outcome predictions

My bibliography Save this paper

Honest calibration assessment for binary outcome predictions

Author

Listed:

Timo Dimitriadis
Lutz Duembgen
Alexander Henzi
Marius Puke
Johanna Ziegel

Registered:

Abstract

Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$ for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the calibration curve $p$ and the local variance of the binary observations. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.

Suggested Citation

Timo Dimitriadis & Lutz Duembgen & Alexander Henzi & Marius Puke & Johanna Ziegel, 2022. "Honest calibration assessment for binary outcome predictions," Papers 2203.04065, arXiv.org, revised Nov 2022.

Handle: RePEc:arx:papers:2203.04065

Download full text from publisher

References listed on IDEAS

Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Rejoinder to “Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test”," Biometrics, The International Biometric Society, vol. 76(2), pages 575-577, June.
Koenker, Roger & Yoon, Jungmo, 2009. "Parametric links for binary choice models: A Fisherian-Bayesian colloquy," Journal of Econometrics, Elsevier, vol. 152(2), pages 120-130, October.
Peter Hall & Joel L. Horowitz, 2013. "A simple bootstrap method for constructing nonparametric confidence bands for functions," CeMMAP working papers CWP29/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test," Biometrics, The International Biometric Society, vol. 76(2), pages 549-560, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Mario V. Wuthrich & Johanna Ziegel, 2023. "Isotonic Recalibration under a Low Signal-to-Noise Ratio," Papers 2301.02692, arXiv.org.
Henzi, Alexander & Dümbgen, Lutz, 2023. "Some new inequalities for beta distributions," Statistics & Probability Letters, Elsevier, vol. 195(C).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Stamatiadou, Valentini & Mazaris, Antonios & Mallios, Zisis & Katsanevakis, Stelios, 2023. "Valuation and mapping of the recreational diving ecosystem service of the Aegean Sea," Ecosystem Services, Elsevier, vol. 64(C).
Cornelius K. A. Pienaah & Roger Antabe & Godwin Arku & Isaac Luginaah, 2024. "Farmer field schools, climate action plans and climate change resilience among smallholder farmers in Northern Ghana," Climatic Change, Springer, vol. 177(6), pages 1-25, June.
Rui Liu & Feng Tan & Yaxuan Wang & Bo Ma & Ming Yuan & Lianxia Wang & Xin Zhao, 2022. "Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python Visual Analysis," Agriculture, MDPI, vol. 12(7), pages 1-14, July.
M. Kelemen & J. Danesh & E. Angelantonio & M. Inouye & J. O’Sullivan & L. Pennells & T. Roychowdhury & M. J. Sweeting & A. M. Wood & S. Harrison & L. G. Kim, 2024. "Evaluating the cost-effectiveness of polygenic risk score-stratified screening for abdominal aortic aneurysm," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
Daniel Fernández & Louise McMillan & Richard Arnold & Martin Spiess & Ivy Liu, 2022. "Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model," Stats, MDPI, vol. 5(2), pages 1-14, June.
repec:hum:wpaper:sfb649dp2015-031 is not listed on IDEAS
Narisetty, Naveen & Koenker, Roger, 2022. "Censored quantile regression survival models with a cure proportion," Journal of Econometrics, Elsevier, vol. 226(1), pages 192-203.
Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2008. "Inference On Counterfactual Distributions," Boston University - Department of Economics - Working Papers Series wp2008-005, Boston University - Department of Economics.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2012. "Inference on counterfactual distributions," CeMMAP working papers 05/12, Institute for Fiscal Studies.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2013. "Inference on counterfactual distributions," CeMMAP working papers 17/13, Institute for Fiscal Studies.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2013. "Inference on counterfactual distributions," CeMMAP working papers CWP17/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2009. "Inference on counterfactual distributions," CeMMAP working papers CWP09/09, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2009. "Inference on Counterfactual Distributions," Papers 0904.0951, arXiv.org, revised Sep 2013.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2009. "Inference on counterfactual distributions," CeMMAP working papers 09/09, Institute for Fiscal Studies.
- Victor Chernozhukov & Ivan Fernandez-Val & Blaise Melly, 2012. "Inference on counterfactual distributions," CeMMAP working papers CWP05/12, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
- Kajal Lahiri & Liu Yang, 2012. "Forecasting Binary Outcomes," Discussion Papers 12-09, University at Albany, SUNY, Department of Economics.
Kun Yi & Yoshihiko Nishiyama, 2022. "Smoothed bootstrapping kernel density estimation under higher order kernel," KIER Working Papers 1081, Kyoto University, Institute of Economic Research.
Rui Hua & Wenhao Gui, 2022. "Revisit to progressively Type-II censored competing risks data from Lomax distributions," Journal of Risk and Reliability, , vol. 236(3), pages 377-394, June.
Kobayashi, Yoshiharu & Heinrich, Tobias & Bryant, Kristin A., 2021. "Public support for development aid during the COVID-19 pandemic," World Development, Elsevier, vol. 138(C).
Horowitz, Joel L. & Lee, Sokbae, 2017. "Nonparametric estimation and inference under shape restrictions," Journal of Econometrics, Elsevier, vol. 201(1), pages 108-126.
- Joel L. Horowitz & Sokbae (Simon) Lee, 2015. "Nonparametric estimation and inference under shape restrictions," CeMMAP working papers 67/15, Institute for Fiscal Studies.
- Joel L. Horowitz & Sokbae (Simon) Lee, 2016. "Nonparametric estimation and inference under shape restrictions," CeMMAP working papers 29/16, Institute for Fiscal Studies.
- Joel L. Horowitz & Sokbae (Simon) Lee, 2015. "Nonparametric estimation and inference under shape restrictions," CeMMAP working papers CWP67/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Joel L. Horowitz & Sokbae (Simon) Lee, 2016. "Nonparametric estimation and inference under shape restrictions," CeMMAP working papers CWP29/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Hess Wolfgang & Tutz Gerhard & Gertheiss Jan, 2016. "A Flexible Link Function for Discrete-Time Duration Models," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 455-481, August.
Kato, Kengo & Sasaki, Yuya, 2018. "Uniform confidence bands in deconvolution with unknown error distribution," Journal of Econometrics, Elsevier, vol. 207(1), pages 129-161.
Sokbae Lee & Ryo Okui & Yoonâ€ Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers CWP03/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Lee, Sokbae & Okui, Ryo & Whang, Yoon-Jae, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," LSE Research Online Documents on Economics 86852, London School of Economics and Political Science, LSE Library.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band For The Conditional Average Treatment Effect Function," KIER Working Papers 931, Kyoto University, Institute of Economic Research.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band for the Conditional Average Treatment Effect Function," Papers 1601.02801, arXiv.org, revised Oct 2016.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers 03/16, Institute for Fiscal Studies.
M Ludkin & C Sherlock, 2023. "Hug and hop: a discrete-time, nonreversible Markov chain Monte Carlo algorithm," Biometrika, Biometrika Trust, vol. 110(2), pages 301-318.
Rainer Winkelmann, 2012. "Copula Bivariate Probit Models: With An Application To Medical Expenditures," Health Economics, John Wiley & Sons, Ltd., vol. 21(12), pages 1444-1455, December.
- Rainer Winkelmann, 2011. "Copula bivariate probit models: with an application to medical expenditures," ECON - Working Papers 029, Department of Economics - University of Zurich.
Vijverberg, Chu-Ping C. & Vijverberg, Wim P., 2012. "Pregibit: A Family of Discrete Choice Models," IZA Discussion Papers 6359, Institute of Labor Economics (IZA).
Kengo Kato & Yuya Sasaki & Takuya Ura, 2018. "Inference based on Kotlarski's Identity," Papers 1808.09375, arXiv.org, revised Sep 2019.
Tommaso Proietti & Alessandra Luati, 2013. "The Exponential Model for the Spectrum of a Time Series: Extensions and Applications," CEIS Research Paper 272, Tor Vergata University, CEIS, revised 19 Apr 2013.
- Proietti, Tommaso & Luati, Alessandra, 2013. "The Exponential Model for the Spectrum of a Time Series: Extensions and Applications," MPRA Paper 45280, University Library of Munich, Germany.
- Tommaso Proietti & Alessandra Luati, 2013. "The Exponential Model for the Spectrum of a Time Series: Extensions and Applications," CREATES Research Papers 2013-34, Department of Economics and Business Economics, Aarhus University.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2022-05-02 (Big Data)
NEP-ECM-2022-05-02 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2203.04065. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Honest calibration assessment for binary outcome predictions

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data