IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v76y2020i2p549-560.html
   My bibliography  Save this article

Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test

Author

Listed:
  • Giovanni Nattino
  • Michael L. Pennell
  • Stanley Lemeshow

Abstract

Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer‐Lemeshow test. To address this limitation, we propose a modification of the Hosmer‐Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer‐Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer‐Lemeshow test, based on repeated subsampling. We provide a step‐by‐step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.

Suggested Citation

  • Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test," Biometrics, The International Biometric Society, vol. 76(2), pages 549-560, June.
  • Handle: RePEc:bla:biomet:v:76:y:2020:i:2:p:549-560
    DOI: 10.1111/biom.13249
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13249
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13249?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Archer, Kellie J. & Lemeshow, Stanley & Hosmer, David W., 2007. "Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4450-4464, May.
    2. Alexander Shapiro & Jos Berge, 2002. "Statistical inference of minimum rank factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 79-94, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Timo Dimitriadis & Lutz Duembgen & Alexander Henzi & Marius Puke & Johanna Ziegel, 2022. "Honest calibration assessment for binary outcome predictions," Papers 2203.04065, arXiv.org, revised Nov 2022.
    2. Daniel Fernández & Louise McMillan & Richard Arnold & Martin Spiess & Ivy Liu, 2022. "Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model," Stats, MDPI, vol. 5(2), pages 1-14, June.
    3. Rui Liu & Feng Tan & Yaxuan Wang & Bo Ma & Ming Yuan & Lianxia Wang & Xin Zhao, 2022. "Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python Visual Analysis," Agriculture, MDPI, vol. 12(7), pages 1-14, July.
    4. Zewei Lin & Dungang Liu, 2022. "Model diagnostics of discrete data regression: a unifying framework using functional residuals," Papers 2207.04299, arXiv.org.
    5. Cornelius K. A. Pienaah & Roger Antabe & Godwin Arku & Isaac Luginaah, 2024. "Farmer field schools, climate action plans and climate change resilience among smallholder farmers in Northern Ghana," Climatic Change, Springer, vol. 177(6), pages 1-25, June.
    6. M. Kelemen & J. Danesh & E. Angelantonio & M. Inouye & J. O’Sullivan & L. Pennells & T. Roychowdhury & M. J. Sweeting & A. M. Wood & S. Harrison & L. G. Kim, 2024. "Evaluating the cost-effectiveness of polygenic risk score-stratified screening for abdominal aortic aneurysm," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    7. Stamatiadou, Valentini & Mazaris, Antonios & Mallios, Zisis & Katsanevakis, Stelios, 2023. "Valuation and mapping of the recreational diving ecosystem service of the Aegean Sea," Ecosystem Services, Elsevier, vol. 64(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nowak, Piotr Bolesław, 2016. "The MLE of the mean of the exponential distribution based on grouped data is stochastically increasing," Statistics & Probability Letters, Elsevier, vol. 111(C), pages 49-54.
    2. Dirk Tasche, 2009. "Estimating discriminatory power and PD curves when the number of defaults is small," Papers 0905.3928, arXiv.org, revised Mar 2010.
    3. Karim, Md Aktar Ul & Bhagat, Supriya Ramdas & Bhowmick, Amiya Ranjan, 2022. "Empirical detection of parameter variation in growth curve models using interval specific estimators," Chaos, Solitons & Fractals, Elsevier, vol. 157(C).
    4. Saverio M. Fratini & Alessia Naccarato, 2016. "The Gravitation of Market Prices as A Stochastic Process," Metroeconomica, Wiley Blackwell, vol. 67(4), pages 698-716, November.
    5. Zhang, Li, 2008. "Three essays on agricultural risk and insurance," ISU General Staff Papers 2008010108000016857, Iowa State University, Department of Economics.
    6. Zhang, Ruijing & Dai, Hongzhe, 2022. "A non-Gaussian stochastic model from limited observations using polynomial chaos and fractional moments," Reliability Engineering and System Safety, Elsevier, vol. 221(C).
    7. Camilo Alberto Cárdenas-Hurtado & Aaron Levi Garavito-Acosta & Jorge Hernán Toro-Córdoba, 2018. "Asymmetric Effects of Terms of Trade Shocks on Tradable and Non-tradable Investment Rates: The Colombian Case," Borradores de Economia 1043, Banco de la Republica de Colombia.
    8. Birtukan Atinkut Asmare & Bernhard Freyer & Jim Bingen, 2022. "Pesticide Use Practices among Female Headed Households in the Amhara Region, Ethiopia," Sustainability, MDPI, vol. 14(22), pages 1-26, November.
    9. Anastasiou, Andreas, 2017. "Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 171-181.
    10. Lanqing Hong & Zhi-Sheng Ye & Ran Ling, 2018. "Environmental Risk Assessment of Emerging Contaminants Using Degradation Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(3), pages 390-409, September.
    11. Liu, Baisen & Xu, Lin & Zheng, Shurong & Tian, Guo-Liang, 2014. "A new test for the proportionality of two large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 293-308.
    12. Evelina Di Corso & Tania Cerquitelli & Daniele Apiletti, 2018. "METATECH: METeorological Data Analysis for Thermal Energy CHaracterization by Means of Self-Learning Transparent Models," Energies, MDPI, vol. 11(6), pages 1-24, May.
    13. Silva, Ivair R., 2017. "Confidence intervals through sequential Monte Carlo," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 112-124.
    14. Albrecht, James & Anderson, Axel & Vroman, Susan, 2010. "Search by committee," Journal of Economic Theory, Elsevier, vol. 145(4), pages 1386-1407, July.
    15. Jos'e Luis Montiel Olea & Chen Qiu & Jorg Stoye, 2023. "Decision Theory for Treatment Choice Problems with Partial Identification," Papers 2312.17623, arXiv.org, revised Aug 2024.
    16. Denter, Philipp & Sisak, Dana, 2015. "Do polls create momentum in political competition?," Journal of Public Economics, Elsevier, vol. 130(C), pages 1-14.
    17. Salgado Alfredo, 2018. "Incomplete Information and Costly Signaling in College Admissions," Working Papers 2018-23, Banco de México.
    18. Stegeman, Alwin, 2016. "A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 189-203.
    19. Montebruno, Piero & Bennett, Robert J. & van Lieshout, Carry & Smith, Harry, 2019. "A tale of two tails: Do Power Law and Lognormal models fit firm-size distributions in the mid-Victorian era?," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 858-875.
    20. Mauricio Romero & Ã lvaro Riascos & Diego Jara, 2015. "On the Optimality of Answer-Copying Indices," Journal of Educational and Behavioral Statistics, , vol. 40(5), pages 435-453, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:76:y:2020:i:2:p:549-560. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.