IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2207.04299.html
   My bibliography  Save this paper

Model diagnostics of discrete data regression: a unifying framework using functional residuals

Author

Listed:
  • Zewei Lin
  • Dungang Liu

Abstract

Model diagnostics is an indispensable component of regression analysis, yet it is not well addressed in standard textbooks on generalized linear models. The lack of exposition is attributed to the fact that when outcome data are discrete, classical methods (e.g., Pearson/deviance residual analysis and goodness-of-fit tests) have limited utility in model diagnostics and treatment. This paper establishes a novel framework for model diagnostics of discrete data regression. Unlike the literature defining a single-valued quantity as the residual, we propose to use a function as a vehicle to retain the residual information. In the presence of discreteness, we show that such a functional residual is appropriate for summarizing the residual randomness that cannot be captured by the structural part of the model. We establish its theoretical properties, which leads to the innovation of new diagnostic tools including the functional-residual-vs covariate plot and Function-to-Function (Fn-Fn) plot. Our numerical studies demonstrate that the use of these tools can reveal a variety of model misspecifications, such as not properly including a higher-order term, an explanatory variable, an interaction effect, a dispersion parameter, or a zero-inflation component. The functional residual yields, as a byproduct, Liu-Zhang's surrogate residual mainly developed for cumulative link models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it considerably broadens the diagnostic scope as it applies to virtually all parametric models for binary, ordinal and count data, all in a unified diagnostic scheme.

Suggested Citation

  • Zewei Lin & Dungang Liu, 2022. "Model diagnostics of discrete data regression: a unifying framework using functional residuals," Papers 2207.04299, arXiv.org.
  • Handle: RePEc:arx:papers:2207.04299
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2207.04299
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Archer, Kellie J. & Lemeshow, Stanley & Hosmer, David W., 2007. "Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4450-4464, May.
    2. Dungang Liu & Shaobo Li & Yan Yu & Irini Moustaki, 2021. "Assessing Partial Association Between Ordinal Variables: Quantification, Visualization, and Hypothesis Testing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(534), pages 955-968, April.
    3. Dungang Liu & Heping Zhang, 2018. "Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 845-854, April.
    4. Andreas Blöchlinger & Markus Leippold, 2011. "A New Goodness-of-Fit Test for Event Forecasting and Its Application to Credit Defaults," Management Science, INFORMS, vol. 57(3), pages 487-505, March.
    5. de Jong,Piet & Heller,Gillian Z., 2008. "Generalized Linear Models for Insurance Data," Cambridge Books, Cambridge University Press, number 9780521879149, September.
    6. Franses,Philip Hans & Paap,Richard, 2010. "Quantitative Models in Marketing Research," Cambridge Books, Cambridge University Press, number 9780521143653, September.
    7. Howard D. Bondell, 2007. "Testing goodness-of-fit in logistic case-control studies," Biometrika, Biometrika Trust, vol. 94(2), pages 487-495.
    8. Li, Chun & Shepherd, Bryan E., 2010. "Test of Association Between Two Ordinal Variables While Adjusting for Covariates," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 612-620.
    9. Giovanni Nattino & Michael L. Pennell & Stanley Lemeshow, 2020. "Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test," Biometrics, The International Biometric Society, vol. 76(2), pages 549-560, June.
    10. Weiren Wang & Felix Famoye, 1997. "Modeling household fertility decisions with generalized Poisson regression," Journal of Population Economics, Springer;European Society for Population Economics, vol. 10(3), pages 273-283.
    11. Kellie J. Archer & Stanley Lemeshow, 2006. "Goodness-of-fit test for a logistic regression model fitted using survey sample data," Stata Journal, StataCorp LP, vol. 6(1), pages 97-105, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liu, Dungang & Li, Shaobo & Yu, Yan & Moustaki, Irini, 2020. "Assessing partial association between ordinal variables: quantification, visualization, and hypothesis testing," LSE Research Online Documents on Economics 105558, London School of Economics and Political Science, LSE Library.
    2. Ngokkuen, Chuthaporn & Grote, Ulrike, 2012. "Geographical Indication for Jasmine Rice: Applying a Logit Model to Predict Adoption Behavior of Thai Farm Households," Quarterly Journal of International Agriculture, Humboldt-Universitaat zu Berlin, vol. 51(2), pages 1-29, May.
    3. Daniel Fernández & Louise McMillan & Richard Arnold & Martin Spiess & Ivy Liu, 2022. "Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model," Stats, MDPI, vol. 5(2), pages 1-14, June.
    4. Risselada, Hans & Verhoef, Peter C. & Bijmolt, Tammo H.A., 2010. "Staying Power of Churn Prediction Models," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 198-208.
    5. Michis Antonis A, 2009. "Regression Analysis of Marketing Time Series: A Wavelet Approach with Some Frequency Domain Insights," Review of Marketing Science, De Gruyter, vol. 7(1), pages 1-43, July.
    6. Yang Lu, 2019. "Flexible (panel) regression models for bivariate count–continuous data with an insurance application," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1503-1521, October.
    7. Polo, Yolanda & Sese, F. Javier & Verhoef, Peter C., 2011. "The Effect of Pricing and Advertising on Customer Retention in a Liberalizing Market," Journal of Interactive Marketing, Elsevier, vol. 25(4), pages 201-214.
    8. Avanzi, Benjamin & Taylor, Greg & Wong, Bernard & Yang, Xinda, 2021. "On the modelling of multivariate counts with Cox processes and dependent shot noise intensities," Insurance: Mathematics and Economics, Elsevier, vol. 99(C), pages 9-24.
    9. Chenglong Ye & Lin Zhang & Mingxuan Han & Yanjia Yu & Bingxin Zhao & Yuhong Yang, 2022. "Combining Predictions of Auto Insurance Claims," Econometrics, MDPI, vol. 10(2), pages 1-15, April.
    10. Domenico Piccolo & Rosaria Simone, 2019. "The class of cub models: statistical foundations, inferential issues and empirical evidence," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(3), pages 389-435, September.
    11. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    12. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    13. Aivars Spilbergs & Andris Fomins & Māris Krastiņš, 2022. "Multivariate Modelling of Motor Third Party Liability Insurance Claims," European Journal of Business Science and Technology, Mendel University in Brno, Faculty of Business and Economics, vol. 8(1), pages 5-18.
    14. Deprez, Laurens & Antonio, Katrien & Boute, Robert, 2021. "Pricing service maintenance contracts using predictive analytics," European Journal of Operational Research, Elsevier, vol. 290(2), pages 530-545.
    15. Edwin Van Gameren & Michiel Ras & Evelien Eggink & Ingrid Ooms, 2005. "The demand for housing services in the Netherlands," ERSA conference papers ersa05p327, European Regional Science Association.
    16. Martin Branda, 2014. "Optimization Approaches to Multiplicative Tariff of Rates Estimation in Non-Life Insurance," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 31(05), pages 1-17.
    17. Jeonghwan Kim & Woojoo Lee, 2019. "On testing the hidden heterogeneity in negative binomial regression models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(4), pages 457-470, May.
    18. Janvier Gasana & Boubakari Ibrahimou & Ahmed N. Albatineh & Mustafa Al-Zoughool & Dina Zein, 2021. "Exposures in the Indoor Environment and Prevalence of Allergic Conditions in the United States of America," IJERPH, MDPI, vol. 18(9), pages 1-13, May.
    19. Franses, Philip Hans, 2006. "Forecasting in Marketing," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 1, chapter 18, pages 983-1012, Elsevier.
    20. Keith Davis & Timothy Bell & Jacqueline Miller & Derek Misurski & Bela Bapat, 2011. "Hospital costs, length of stay and mortality associated with childhood, adolescent and young Adult meningococcal disease in the US," Applied Health Economics and Health Policy, Springer, vol. 9(3), pages 197-207, May.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.04299. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.