IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v18y2024i2d10.1007_s11634-023-00539-5.html
   My bibliography  Save this article

Composite likelihood methods for parsimonious model-based clustering of mixed-type data

Author

Listed:
  • Monia Ranalli

    (Sapienza University of Rome)

  • Roberto Rocci

    (Sapienza University of Rome)

Abstract

In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous data follow a multivariate finite mixture of Gaussians, where the ordinal variables are a discretization of some continuous variates of the mixture. The general class of parsimonious models is based on a factor decomposition of the component-specific covariance matrices. Parameter estimation is carried out using a EM-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data.

Suggested Citation

  • Monia Ranalli & Roberto Rocci, 2024. "Composite likelihood methods for parsimonious model-based clustering of mixed-type data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 381-407, June.
  • Handle: RePEc:spr:advdac:v:18:y:2024:i:2:d:10.1007_s11634-023-00539-5
    DOI: 10.1007/s11634-023-00539-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-023-00539-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-023-00539-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kanti V. Mardia & John T. Kent & Gareth Hughes & Charles C. Taylor, 2009. "Maximum likelihood estimation using composite likelihoods for closed exponential families," Biometrika, Biometrika Trust, vol. 96(4), pages 975-982.
    2. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "Erratum to: The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 327-355, July.
    3. Rayees Farooq, 2022. "Heywood cases: possible causes and solutions," International Journal of Data Analysis Techniques and Strategies, Inderscience Enterprises Ltd, vol. 14(1), pages 79-88.
    4. J. Carroll & Jih-Jie Chang, 1970. "Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition," Psychometrika, Springer;The Psychometric Society, vol. 35(3), pages 283-319, September.
    5. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    6. Raymond Cattell, 1944. "“Parallel proportional profiles” and other principles for determining the choice of factors by rotation," Psychometrika, Springer;The Psychometric Society, vol. 9(4), pages 267-283, December.
    7. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    8. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    9. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 85-113, April.
    10. Walter Ledermann, 1937. "On the rank of the reduced correlational matrix in multiple-factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 2(2), pages 85-93, June.
    11. Everitt, B. S., 1988. "A finite mixture model for the clustering of mixed-mode data," Statistics & Probability Letters, Elsevier, vol. 6(5), pages 305-309, April.
    12. Lee, Sik-Yum & Poon, Wai-Yin & Bentler, P. M., 1990. "Full maximum likelihood analysis of structural equation models with polytomous variables," Statistics & Probability Letters, Elsevier, vol. 9(1), pages 91-97, January.
    13. Gao, Xin & Song, Peter X.-K., 2010. "Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1531-1540.
    14. Paolo Giordani & Roberto Rocci & Giuseppe Bove, 2020. "Factor Uniqueness of the Structural Parafac Model," Psychometrika, Springer;The Psychometric Society, vol. 85(3), pages 555-574, September.
    15. Khalili, Abbas & Chen, Jiahua, 2007. "Variable Selection in Finite Mixture of Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1025-1038, September.
    16. Ranalli, Monia & Rocci, Roberto, 2017. "Mixture models for mixed-type data through a composite likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 87-102.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Monia Ranalli & Roberto Rocci, 2017. "A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1007-1034, December.
    2. Ranalli, Monia & Rocci, Roberto, 2017. "Mixture models for mixed-type data through a composite likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 87-102.
    3. Kemmawadee Preedalikit & Daniel Fernández & Ivy Liu & Louise McMillan & Marta Nai Ruscone & Roy Costilla, 2024. "Row mixture-based clustering with covariates for ordinal responses," Computational Statistics, Springer, vol. 39(5), pages 2511-2555, July.
    4. Chao Huang & Martin Styner & Hongtu Zhu, 2015. "Clustering High-Dimensional Landmark-Based Two-Dimensional Shape Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 946-961, September.
    5. Sangkon Oh & Byungtae Seo, 2023. "Merging Components in Linear Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 25-51, April.
    6. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    7. Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2021. "Matrix Normal Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 556-575, October.
    8. Počuča, Nikola & Jevtić, Petar & McNicholas, Paul D. & Miljkovic, Tatjana, 2020. "Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 79-93.
    9. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    10. Michael P. B. Gallaugher & Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2022. "Multivariate cluster weighted models using skewed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 93-124, March.
    11. Roberto Mari & Salvatore Ingrassia & Antonio Punzo, 2023. "Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 233-266, July.
    12. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    13. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    14. Wan-Lun Wang & Yu-Chen Yang & Tsung-I Lin, 2024. "Extending finite mixtures of nonlinear mixed-effects models with covariate-dependent mixing weights," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 271-307, June.
    15. Galimberti, Giuliano & Montanari, Angela & Viroli, Cinzia, 2009. "Penalized factor mixture analysis for variable selection in clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4301-4310, October.
    16. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    17. Naderi, Mehrdad & Mirfarah, Elham & Wang, Wan-Lun & Lin, Tsung-I, 2023. "Robust mixture regression modeling based on the normal mean-variance mixture distributions," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    18. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    19. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    20. Benjamin Auder & Elisabeth Gassiat & Mor Absa Loum, 2021. "Least squares moment identification of binary regression mixture models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(4), pages 561-593, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:18:y:2024:i:2:d:10.1007_s11634-023-00539-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.