IDEAS home Printed from https://ideas.repec.org/a/taf/amstat/v70y2016i3p296-303.html
   My bibliography  Save this article

Visualizing Count Data Regressions Using Rootograms

Author

Listed:
  • Christian Kleiber
  • Achim Zeileis

Abstract

The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.

Suggested Citation

  • Christian Kleiber & Achim Zeileis, 2016. "Visualizing Count Data Regressions Using Rootograms," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 296-303, July.
  • Handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:296-303
    DOI: 10.1080/00031305.2016.1173590
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/00031305.2016.1173590
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/00031305.2016.1173590?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273.
    2. Fox, John, 2003. "Effect Displays in R for Generalised Linear Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i15).
    3. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    4. Zeileis, Achim & Kleiber, Christian & Jackman, Simon, 2008. "Regression Models for Count Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i08).
    5. Mullahy, John, 1997. "Heterogeneity, Excess Zeros, and the Structure of Count Data Models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 337-350, May-June.
    6. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    7. Grün, Bettina & Leisch, Friedrich, 2008. "FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i04).
    8. Deb, Partha & Trivedi, Pravin K, 1997. "Demand for Medical Care by the Elderly: A Finite Mixture Approach," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 313-336, May-June.
    9. Cameron, A. Colin & Trivedi, Pravin K., 1990. "Regression-based tests for overdispersion in the Poisson model," Journal of Econometrics, Elsevier, vol. 46(3), pages 347-364, December.
    10. Fox, John & Hong, Jangman, 2009. "Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i01).
    11. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    12. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yaşar Tonta & Müge Akbulut, 2020. "Does monetary support increase citation impact of scholarly papers?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1617-1641, November.
    2. Gozde Ozonder & Eric J. Miller, 2021. "Longitudinal analysis of activity generation in the Greater Toronto and Hamilton Area," Transportation, Springer, vol. 48(3), pages 1149-1183, June.
    3. Marcelo Bourguignon & Rodrigo M. R. Medeiros, 2022. "A simple and useful regression model for fitting count data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(3), pages 790-827, September.
    4. Bilal Barakat, 2017. "Generalised count distributions for modelling parity," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 36(26), pages 745-758.
    5. Lagona, Francesco & Padovano, Fabio, 2021. "How does legislative behavior change when the country becomes democratic? The case of South Korea," European Journal of Political Economy, Elsevier, vol. 69(C).
    6. Cornelia Fuetterer & Thomas Augustin & Christiane Fuchs, 2020. "Adapted single-cell consensus clustering (adaSC3)," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 885-896, December.
    7. Brutti, Zelda & Montolio, Daniel, 2021. "Preventing criminal minds: Early education access and adult offending behavior," Journal of Economic Behavior & Organization, Elsevier, vol. 191(C), pages 97-126.
    8. Evangelos Papadias & Vassilis Detsis & Antonis Hadjikyriacou & Apostolos G. Papadopoulos & Christoforos Vradis & Christos Chalkias, 2023. "Long-Term Dynamics of Viticultural Landscape in Cyprus—Four Centuries of Expansion, Contraction and Spatial Displacement," Land, MDPI, vol. 12(6), pages 1-23, May.
    9. Thorsten Simon & Georg J. Mayr & Nikolaus Umlauf & Achim Zeileis, 2018. "Lightning Prediction Using Model Output Statistics," Working Papers 2018-14, Faculty of Economics and Statistics, Universität Innsbruck.
    10. Candelon, Bertrand & Joëts, Marc & Mignon, Valérie, 2024. "What makes econometric ideas popular: The role of connectivity," Research Policy, Elsevier, vol. 53(7).
    11. Chiara Bocci & Laura Grassini & Emilia Rocco, 2021. "A multiple inflated negative binomial hurdle regression model: analysis of the Italians’ tourism behaviour during the Great Recession," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1109-1133, October.
    12. Cornelius Fritz & Göran Kauermann, 2022. "On the interplay of regional mobility, social connectedness and the spread of COVID‐19 in Germany," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 400-424, January.
    13. Balakrishnan, Srijith & Lim, Taehoon & Zhang, Zhanmin, 2022. "A methodology for evaluating the economic risks of hurricane-related disruptions to port operations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 162(C), pages 58-79.
    14. Virgili, Auriane & Racine, Mélanie & Authier, Matthieu & Monestiez, Pascal & Ridoux, Vincent, 2017. "Comparison of habitat models for scarcely detected species," Ecological Modelling, Elsevier, vol. 346(C), pages 88-98.
    15. Adrian Richter & Julia Truthmann & Jean-François Chenot & Carsten Oliver Schmidt, 2021. "Predicting Physician Consultations for Low Back Pain Using Claims Data and Population-Based Cohort Data—An Interpretable Machine Learning Approach," IJERPH, MDPI, vol. 18(22), pages 1-14, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zeileis, Achim & Kleiber, Christian & Jackman, Simon, 2008. "Regression Models for Count Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i08).
    2. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    3. Ana María Martínez-Rodríguez & Antonio Conde-Sánchez & María José Olmo-Jiménez, 2019. "A new approach to truncated regression for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 503-526, December.
    4. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    5. John Haslett & Andrew C. Parnell & John Hinde & Rafael de Andrade Moral, 2022. "Modelling Excess Zeros in Count Data: A New Perspective on Modelling Approaches," International Statistical Review, International Statistical Institute, vol. 90(2), pages 216-236, August.
    6. repec:jss:jstsof:27:i08 is not listed on IDEAS
    7. Stefano Mainardi, 2003. "Testing convergence in life expectancies: count regression models on panel data," Prague Economic Papers, Prague University of Economics and Business, vol. 2003(4), pages 350-370.
    8. Livio Finos & Fortunato Pesarin, 2020. "On zero-inflated permutation testing and some related problems," Statistical Papers, Springer, vol. 61(5), pages 2157-2174, October.
    9. Bach, Philipp & Farbmacher, Helmut & Spindler, Martin, 2018. "Semiparametric count data modeling with an application to health service demand," Econometrics and Statistics, Elsevier, vol. 8(C), pages 125-140.
    10. Sisira Sarma & Wayne Simpson, 2006. "A microeconometric analysis of Canadian health care utilization," Health Economics, John Wiley & Sons, Ltd., vol. 15(3), pages 219-239, March.
    11. Candelon, Bertrand & Joëts, Marc & Mignon, Valérie, 2024. "What makes econometric ideas popular: The role of connectivity," Research Policy, Elsevier, vol. 53(7).
    12. Margarita E. Romero Rodríguez & Enrique Los Arcos & Victor Cano Fernández & Miguel Sánchez Padrón, 2001. "Modelo para datos de recuentro de corte transversal con exceso de ceros. Aplicación a citas patentes," Documentos de trabajo conjunto ULL-ULPGC 2001-05, Facultad de Ciencias Económicas de la ULPGC.
    13. Nan-Ting Liu & Feng-Chang Lin & Yu-Shan Shih, 2020. "Count regression trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 5-27, March.
    14. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    15. Yixuan Zou & Jan Hannig & Derek S. Young, 2021. "Generalized fiducial inference on the mean of zero-inflated Poisson and Poisson hurdle models," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-15, December.
    16. Sergi Jiménez‐Martín & José M. Labeaga & Maite Martínez‐Granado, 2002. "Latent class versus two‐part models in the demand for physician services across the European Union," Health Economics, John Wiley & Sons, Ltd., vol. 11(4), pages 301-321, June.
    17. repec:jss:jstsof:36:i07 is not listed on IDEAS
    18. Gozde Ozonder & Eric J. Miller, 2021. "Longitudinal analysis of activity generation in the Greater Toronto and Hamilton Area," Transportation, Springer, vol. 48(3), pages 1149-1183, June.
    19. Olivier Finance & Clémentine Cottineau, 2019. "Are the absent always wrong? Dealing with zero values in urban scaling," Environment and Planning B, , vol. 46(9), pages 1663-1677, November.
    20. Francesco Zuniga & Tomasz J. Kozubowski & Anna K. Panorska, 2021. "A new trivariate model for stochastic episodes," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-21, December.
    21. Yixuan Wang & Jianzhu Li & Ping Feng & Rong Hu, 2015. "A Time-Dependent Drought Index for Non-Stationary Precipitation Series," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(15), pages 5631-5647, December.
    22. Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.

    More about this item

    JEL classification:

    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C87 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Econometric Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:amstat:v:70:y:2016:i:3:p:296-303. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UTAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.