IDEAS home Printed from https://ideas.repec.org/a/eee/ecosta/v3y2017icp141-159.html
   My bibliography  Save this article

Evolutionary clustering for categorical data using parametric links among multinomial mixture models

Author

Listed:
  • Hasnat, Md. Abul
  • Velcin, Julien
  • Bonnevay, Stephane
  • Jacques, Julien

Abstract

A novel evolutionary clustering method for temporal categorical data based on parametric links among the Multinomial mixture models is proposed. Besides clustering, the main goal is to interpret the evolution of clusters over time. To this aim, first the formulation of a generalized model that establishes parametric links among two Multinomial mixture models is proposed. Afterward, different parametric sub-models are defined in order to model the typical evolution of the clustering structure. Model selection criteria allow to select the best sub-model and thus to guess the clustering evolution. For the experiments, the proposed method is first evaluated with synthetic temporal data. Next, it is applied to analyze the annotated social media data. Results show that the proposed method is better than the state-of-the-art based on the common evaluation metrics. Additionally, it can provide interpretation about the temporal evolution of the clusters.

Suggested Citation

  • Hasnat, Md. Abul & Velcin, Julien & Bonnevay, Stephane & Jacques, Julien, 2017. "Evolutionary clustering for categorical data using parametric links among multinomial mixture models," Econometrics and Statistics, Elsevier, vol. 3(C), pages 141-159.
  • Handle: RePEc:eee:ecosta:v:3:y:2017:i:c:p:141-159
    DOI: 10.1016/j.ecosta.2017.03.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2452306217300175
    Download Restriction: Full text for ScienceDirect subscribers only. Contains open access articles

    File URL: https://libkey.io/10.1016/j.ecosta.2017.03.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Christophe Biernacki & Farid Beninel & Vincent Bretagnolle, 2002. "A Generalized Discriminant Rule When Training Population and Test Population Differ on Their Descriptive Parameters," Biometrics, The International Biometric Society, vol. 58(2), pages 387-397, June.
    3. J. Jacques & C. Biernacki, 2010. "Extension of model-based classification for binary data when training and test populations differ," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(5), pages 749-766.
    4. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard & Langrognet, Florent, 2006. "Model-based cluster and discriminant analysis with the MIXMOD software," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 587-600, November.
    5. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    6. Jean-Charles Lamirel, 2012. "A new approach for automatizing the analysis of research topics dynamics: application to optoelectronics research," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(1), pages 151-166, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dirick, Lore & Claeskens, Gerda & Vasnev, Andrey & Baesens, Bart, 2022. "A hierarchical mixture cure model with unobserved heterogeneity for credit risk," Econometrics and Statistics, Elsevier, vol. 22(C), pages 39-55.
    2. Schatz, Michael & Wheatley, Spencer & Sornette, Didier, 2022. "The ARMA Point Process and its Estimation," Econometrics and Statistics, Elsevier, vol. 24(C), pages 164-182.
    3. Jacques, Julien & Biernacki, Christophe, 2018. "Model-based co-clustering for ordinal data," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 101-115.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    2. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    3. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    4. Lebret, Rémi & Iovleff, Serge & Langrognet, Florent & Biernacki, Christophe & Celeux, Gilles & Govaert, Gérard, 2015. "Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i06).
    5. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    6. Semhar Michael & Volodymyr Melnykov, 2016. "An effective strategy for initializing the EM algorithm in finite mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 563-583, December.
    7. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    8. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    9. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    10. Tomarchio, Salvatore D. & Punzo, Antonio & Bagnato, Luca, 2020. "Two new matrix-variate distributions with application in model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    11. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    12. Salvatore D. Tomarchio & Luca Bagnato & Antonio Punzo, 2022. "Model-based clustering via new parsimonious mixtures of heavy-tailed distributions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 315-347, June.
    13. Gabriele Perrone & Gabriele Soffritti, 2023. "Seemingly unrelated clusterwise linear regression for contaminated data," Statistical Papers, Springer, vol. 64(3), pages 883-921, June.
    14. Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2022. "Gaussian mixture model with an extended ultrametric covariance structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 399-427, June.
    15. Alex Sharp & Glen Chalatov & Ryan P. Browne, 2023. "A dual subspace parsimonious mixture of matrix normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 801-822, September.
    16. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    17. Gallopin Mélina & Celeux Gilles & Jaffrézic Florence & Rau Andrea, 2015. "A model selection criterion for model-based clustering of annotated gene expression data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(5), pages 413-428, November.
    18. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    19. Roberto Mari & Salvatore Ingrassia & Antonio Punzo, 2023. "Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 233-266, July.
    20. Melnykov, Volodymyr & Melnykov, Igor, 2012. "Initializing the EM algorithm in Gaussian mixture models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1381-1395.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecosta:v:3:y:2017:i:c:p:141-159. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/econometrics-and-statistics .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.