IDEAS home Printed from https://ideas.repec.org/a/spr/sankha/v84y2022i2d10.1007_s13171-020-00215-2.html
   My bibliography  Save this article

Multinomial Logistic Mixed Models for Clustered Categorical Data in a Complex Survey Sampling Setup

Author

Listed:
  • Brajendra C. Sutradhar

    (Carleton University
    Memorial University)

Abstract

In a finite/survey population setup, where categorical/multinomial responses are collected from individuals belonging to a cluster, in a recent study Skinner (International Statistical Review, 87, S64-S78 2019) has modeled the means of the clustered categorical responses as a function of regression parameters, and suggested a ‘working’ correlations based GEE (generalized estimating equations) approach for the estimation of the regression parameters. However, this mean model involving only regression parameters is not justified for clustered multinomial responses because of the fact that these responses share a common cluster effect which compels the clustered correlation parameter to enter into the mean function on top of the regression parameters. Consequently, the so-called GEE approach, which requires the means to be free of correlations, is not applicable for regression analysis in the clustered multinomial setup. As a remedy, in this paper we consider a multinomial mixed model which accommodates the clustered correlation parameter in the mean functions. For inferences in the present finite population setup, as the GQL (generalized quasi-likelihood) approach is known to produce consistent and more efficient estimate than the MM (method of moments) approach in an infinite population setup, we estimate the regression parameters of primary interest by using the first order response based survey weighted GQL (WGQL) approach. For the estimation of the random effects variance (also known as clustered correlation) parameter, as it is of secondary interest, we use the second order response based survey weighted MM (WMM) approach, which is simpler than the corresponding WGQL estimation approach. The estimation steps are presented clearly for the benefit to the practitioners. Also because, in practice, survey practitioners such as statistical agencies frequently deal with a large health or socio-economic data set at national or state levels, for example, we make sure for their benefit that our proposed WGQL and WMM estimators are consistent. Thus, the asymptotic properties such as asymptotic unbiasedness and consistency for both regression and clustered correlation parameters are studied in details. The asymptotic normality property, for the benefit of constructing confidence interval for the main regression parameters, is also studied.

Suggested Citation

  • Brajendra C. Sutradhar, 2022. "Multinomial Logistic Mixed Models for Clustered Categorical Data in a Complex Survey Sampling Setup," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 743-789, August.
  • Handle: RePEc:spr:sankha:v:84:y:2022:i:2:d:10.1007_s13171-020-00215-2
    DOI: 10.1007/s13171-020-00215-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13171-020-00215-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13171-020-00215-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Brajendra C Sutradhar, 2018. "A Parameter Dimension-Split Based Asymptotic Regression Estimation Theory for a Multinomial Panel Data Model," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 301-329, August.
    2. C., E. A. Molina & Skinner, C. J., 1992. "Pseudo-likelihood and quasi-likelihood estimation for complex sampling schemes," Computational Statistics & Data Analysis, Elsevier, vol. 13(4), pages 395-405, May.
    3. Skinner, Chris J. & de Toledo Vieira, Marcel, 2007. "Variance estimation in the analysis of clustered longitudinal survey data," LSE Research Online Documents on Economics 39106, London School of Economics and Political Science, LSE Library.
    4. E.A. Molina & T.M.F. Smith & R.A. Sugden, 2001. "Modelling Overdispersion for Complex Survey Data," International Statistical Review, International Statistical Institute, vol. 69(3), pages 373-384, December.
    5. Chris Skinner, 2019. "Analysis of Categorical Data for Complex Surveys," International Statistical Review, International Statistical Institute, vol. 87(S1), pages 64-78, May.
    6. Thomas R. Ten Have & Alfredo Morabia, 1999. "Mixed Effects Models with Bivariate and Univariate Association Parameters for Longitudinal Bivariate Binary Response Data," Biometrics, The International Biometric Society, vol. 55(1), pages 85-93, March.
    7. Brajendra C. Sutradhar & Nan Zheng, 2018. "Inferences in Binary Dynamic Fixed Models in a Semi-parametric Setup," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 263-291, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Brajendra C. Sutradhar, 2023. "Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 829-884, February.
    2. Ana Isabel Polo Peña & Dolores María Frías Jamilena & Miguel Ángel Rodríguez Molina, 2017. "The effects of perceived value on loyalty: the moderating effect of market orientation adoption," Service Business, Springer;Pan-Pacific Business Association, vol. 11(1), pages 93-116, March.
    3. Brajendra C. Sutradhar, 2022. "Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(1), pages 259-302, May.
    4. Brajendra C. Sutradhar & R. Prabhakar Rao, 2023. "Asymptotic Inferences in a Multinomial Logit Mixed Model for Spatial Categorical Data," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 885-930, February.
    5. David Gunawan & William Griffths & Anatasios Panagiotelis and Duangkamon Chotikapanich, 2017. "Bayesian Weighted Inference from Surveys "Abstract: Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selecti," Department of Economics - Working Papers Series 2030, The University of Melbourne.
    6. D. Todem & Y. Zhang & A. Ismail & W. Sohn, 2010. "Random effects regression models for count data with excess zeros in caries research," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(10), pages 1661-1679.
    7. Skinner, Chris J., 2018. "Analysis of categorical data for complex surveys," LSE Research Online Documents on Economics 89707, London School of Economics and Political Science, LSE Library.
    8. Brajendra C. Sutradhar, 2023. "Regression analysis for exponential family data in a finite population setup using two-stage cluster sample," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(3), pages 425-462, June.
    9. Brajendra C. Sutradhar, 2023. "Prediction Theory for Multinomial Proportions Using Two-stage Cluster Samples," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(2), pages 1452-1488, August.
    10. Hyunju Dan & Jiyoung Kim & Oksoo Kim, 2020. "Effects of Gender and Age on Dietary Intake and Body Mass Index in Hypertensive Patients: Analysis of the Korea National Health and Nutrition Examination," IJERPH, MDPI, vol. 17(12), pages 1-9, June.
    11. Elena Castilla & Nirian Martín & Leandro Pardo, 2018. "Minimum phi-divergence estimators for multinomial logistic regression with complex sample design," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(3), pages 381-411, July.
    12. Bartolucci, Francesco & Farcomeni, Alessio, 2009. "A Multivariate Extension of the Dynamic Logit Model for Longitudinal Data Based on a Latent Markov Heterogeneity Structure," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 816-831.
    13. R. Prabhakar Rao & Brajendra C. Sutradhar, 2020. "Multiple Categorical Covariates-Based Multinomial Dynamic Response Model," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(1), pages 186-219, February.
    14. Chaubert, F. & Mortier, F. & Saint André, L., 2008. "Multivariate dynamic model for ordinal outcomes," Journal of Multivariate Analysis, Elsevier, vol. 99(8), pages 1717-1732, September.
    15. Daniel Nevo & Deborah Blacker & Eric B. Larson & Sebastien Haneuse, 2022. "Modeling semi‐competing risks data as a longitudinal bivariate process," Biometrics, The International Biometric Society, vol. 78(3), pages 922-936, September.
    16. Eziyi Ibem & Dolapo Amole, 2014. "Satisfaction with Life in Public Housing in Ogun State, Nigeria: A Research Note," Journal of Happiness Studies, Springer, vol. 15(3), pages 495-501, June.
    17. Sutradhar, Brajendra C., 2021. "Block-band behavior of spatial correlations: An analytical asymptotic study in a spatial exponential family data setup," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    18. Brajendra C. Sutradhar, 2021. "Two Stage Cluster Sampling Based Asymptotic Inferences in Survey Population Models for Longitudinal Count and Categorical Data," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 26-69, February.
    19. Oǧuz-Alper, Melike & Berger, Yves G., 2020. "Modelling multilevel data under complex sampling designs: An empirical likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    20. Alinne Veiga & Peter W. F. Smith & James J. Brown, 2014. "The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 65-84, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankha:v:84:y:2022:i:2:d:10.1007_s13171-020-00215-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.