IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v159y2021ics0167947321000517.html
   My bibliography  Save this article

Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States

Author

Listed:
  • Zhang, Tonglin
  • Lin, Ge

Abstract

Generalized k-means can be combined with any similarity or dissimilarity measure for clustering. Using the well known likelihood ratio or F-statistic as the dissimilarity measure, a generalized k-means method is proposed to group generalized linear models (GLMs) for exponential family distributions. Given the number of clusters k, the proposed method is established by the uniform most powerful unbiased (UMPU) test statistic for the comparison between GLMs. If k is unknown, then the proposed method can be combined with generalized liformation criterion (GIC) to automatically select the best k for clustering. Both AIC and BIC are investigated as special cases of GIC. Theoretical and simulation results show that the number of clusters can be correctly identified by BIC but not AIC. The proposed method is applied to the state-level daily COVID-19 data in the United States, and it identifies 6 clusters. A further study shows that the models between clusters are significantly different from each other, which confirms the result with 6 clusters.

Suggested Citation

  • Zhang, Tonglin & Lin, Ge, 2021. "Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
  • Handle: RePEc:eee:csdana:v:159:y:2021:i:c:s0167947321000517
    DOI: 10.1016/j.csda.2021.107217
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321000517
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107217?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Li-Xuan Qin & Steven G. Self, 2006. "The Clustering of Regression Models Method with Applications in Gene Expression Data," Biometrics, The International Biometric Society, vol. 62(2), pages 526-533, June.
    2. Jianqing Fan & Shaojun Guo & Ning Hao, 2012. "Variance estimation using refitted cross‐validation in ultrahigh dimensional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(1), pages 37-65, January.
    3. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    4. Zhang, Yiyun & Li, Runze & Tsai, Chih-Ling, 2010. "Regularization Parameter Selections via Generalized Information Criterion," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 312-323.
    5. J. A. Hartigan & M. A. Wong, 1979. "A K‐Means Clustering Algorithm," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 28(1), pages 100-108, March.
    6. Junhui Wang, 2010. "Consistent selection of the number of clusters via crossvalidation," Biometrika, Biometrika Trust, vol. 97(4), pages 893-904.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cerqueti, Roy & Ficcadenti, Valerio, 2022. "Combining rank-size and k-means for clustering countries over the COVID-19 new deaths per million," Chaos, Solitons & Fractals, Elsevier, vol. 158(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jelle R Dalenberg & Luca Nanetti & Remco J Renken & René A de Wijk & Gert J ter Horst, 2014. "Dealing with Consumer Differences in Liking during Repeated Exposure to Food; Typical Dynamics in Rating Behavior," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    2. Dario Cottafava & Giulia Sonetti & Paolo Gambino & Andrea Tartaglino, 2018. "Explorative Multidimensional Analysis for Energy Efficiency: DataViz versus Clustering Algorithms," Energies, MDPI, vol. 11(5), pages 1-18, May.
    3. Jonas M. B. Haslbeck & Dirk U. Wulff, 2020. "Estimating the number of clusters via a corrected clustering instability," Computational Statistics, Springer, vol. 35(4), pages 1879-1894, December.
    4. Peter Radchenko & Gourab Mukherjee, 2017. "Convex clustering via l 1 fusion penalization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1527-1546, November.
    5. Julian Rossbroich & Jeffrey Durieux & Tom F. Wilderjans, 2022. "Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 264-301, July.
    6. Stéphane Aris-Brosou, 2007. "Dating Phylogenies with Hybrid Local Molecular Clocks," PLOS ONE, Public Library of Science, vol. 2(9), pages 1-8, September.
    7. Mohiuddin, Hossain & Fitch-Polse, Dillon T. & Handy, Susan L., 2024. "Examining market segmentation to increase bike-share use and enhance equity: The case of the greater Sacramento region," Transport Policy, Elsevier, vol. 145(C), pages 279-290.
    8. Fang, Yixin & Wang, Junhui, 2012. "Selection of the number of clusters via the bootstrap method," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 468-477.
    9. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    10. Lim, Alejandro & Chiang, Chin-Tsang & Teng, Jen-Chieh, 2021. "Estimating robot strengths with application to selection of alliance members in FIRST robotics competitions," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    11. Thiemo Fetzer & Samuel Marden, 2017. "Take What You Can: Property Rights, Contestability and Conflict," Economic Journal, Royal Economic Society, vol. 0(601), pages 757-783, May.
    12. Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," CESifo Working Paper Series 9567, CESifo.
    13. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    14. Ren, Weijie & Li, Baisong & Han, Min, 2020. "A novel Granger causality method based on HSIC-Lasso for revealing nonlinear relationship between multivariate time series," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 541(C).
    15. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    16. Orietta Nicolis & Jean Paul Maidana & Fabian Contreras & Danilo Leal, 2024. "Analyzing the Impact of COVID-19 on Economic Sustainability: A Clustering Approach," Sustainability, MDPI, vol. 16(4), pages 1-30, February.
    17. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    18. Xu, Jing & Wang, Xiaoying & Gu, Yujiong & Ma, Suxia, 2023. "A data-based day-ahead scheduling optimization approach for regional integrated energy systems with varying operating conditions," Energy, Elsevier, vol. 283(C).
    19. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    20. Forzani, Liliana & Gieco, Antonella & Tolmasky, Carlos, 2017. "Likelihood ratio test for partial sphericity in high and ultra-high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 18-38.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:159:y:2021:i:c:s0167947321000517. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.