IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v16y2022i2d10.1007_s11634-021-00488-x.html
   My bibliography  Save this article

Gaussian mixture model with an extended ultrametric covariance structure

Author

Listed:
  • Carlo Cavicchia

    (Erasmus University Rotterdam)

  • Maurizio Vichi

    (University of Rome La Sapienza)

  • Giorgia Zaccaria

    (University of Rome La Sapienza)

Abstract

Gaussian Mixture Models (GMMs) are one of the most widespread methodologies for model-based clustering. They assume a multivariate Gaussian distribution for each component of the mixture, centered at the mean vector and with volume, shape and orientation derived by the covariance matrix. To reduce the large number of parameters produced by the covariance matrices, parsimonious parameterizations of the latter were proposed in literature, e.g., the eigen-decomposition and the parsimonious GMMs based on mixtures of probabilistic principal component analyzers and mixtures of factor analyzers. We introduce a new parameterization of a covariance matrix by defining an extended ultrametric covariance matrix and we implement it into a GMM. This structure can be used to describe multidimensional phenomena which are characterized by nested latent concepts having different levels of abstraction, from the most specific to the most general. The proposal is able to pinpoint a hierarchical structure on variables for each component of the GMM, thus identifying a different characterization of a multidimensional phenomenon for each component (cluster, subpopulation) of the mixture. At the same time, it defines a new parsimonious GMM since the ultrametric covariance structure reconstructs the relationships among variables with a limited number of parameters. The proposal is applied on synthetic and real data. On the former it shows good performance in terms of classification when compared to the other existing parameterizations, and on the latter it also provides insight into the hierarchical relationships among the variables for each cluster.

Suggested Citation

  • Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2022. "Gaussian mixture model with an extended ultrametric covariance structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 399-427, June.
  • Handle: RePEc:spr:advdac:v:16:y:2022:i:2:d:10.1007_s11634-021-00488-x
    DOI: 10.1007/s11634-021-00488-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00488-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00488-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    4. Marco Riani & Andrea Cerioli & Domenico Perrotta & Francesca Torti, 2015. "Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 461-481, December.
    5. Francis Cailliez, 1983. "The analytical solution of the additive constant problem," Psychometrika, Springer;The Psychometric Society, vol. 48(2), pages 305-308, June.
    6. Bergé, Laurent & Bouveyron, Charles & Girard, Stéphane, 2012. "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i06).
    7. Rocci, Roberto & Vichi, Maurizio, 2008. "Two-mode multi-partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1984-2003, January.
    8. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard & Langrognet, Florent, 2006. "Model-based cluster and discriminant analysis with the MIXMOD software," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 587-600, November.
    9. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    10. Raymond Cattell, 1947. "Confirmation and clarification of primary personality factors," Psychometrika, Springer;The Psychometric Society, vol. 12(3), pages 197-220, September.
    11. Robert Wherry, 1959. "Hierarchical factor solutions without rotation," Psychometrika, Springer;The Psychometric Society, vol. 24(1), pages 45-51, March.
    12. Hathaway, Richard J., 1986. "Another interpretation of the EM algorithm for mixture distributions," Statistics & Probability Letters, Elsevier, vol. 4(2), pages 53-56, March.
    13. John Schmid & John Leiman, 1957. "The development of hierarchical factor solutions," Psychometrika, Springer;The Psychometric Society, vol. 22(1), pages 53-61, March.
    14. Chris Fraley & Adrian E. Raftery, 1999. "MCLUST: Software for Model-Based Cluster Analysis," Journal of Classification, Springer;The Classification Society, vol. 16(2), pages 297-306, July.
    15. Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    3. Alex Sharp & Glen Chalatov & Ryan P. Browne, 2023. "A dual subspace parsimonious mixture of matrix normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 801-822, September.
    4. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    5. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    6. Carlo Cavicchia & Maurizio Vichi, 2022. "Second-Order Disjoint Factor Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 289-309, March.
    7. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.
    8. Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2020. "The ultrametric correlation matrix for modelling hierarchical latent concepts," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 837-853, December.
    9. Kim, Nam-Hwui & Browne, Ryan P., 2021. "In the pursuit of sparseness: A new rank-preserving penalty for a finite mixture of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    10. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    11. Carlo Cavicchia & Maurizio Vichi, 2021. "Statistical Model-Based Composite Indicators for Tracking Coherent Policy Conclusions," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 156(2), pages 449-479, August.
    12. M. P. B. Gallaugher & C. Biernacki & P. D. McNicholas, 2023. "Parameter-wise co-clustering for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1597-1619, September.
    13. Vaghefi, A. & Farzan, Farbod & Jafari, Mohsen A., 2015. "Modeling industrial loads in non-residential buildings," Applied Energy, Elsevier, vol. 158(C), pages 378-389.
    14. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    15. Galimberti, Giuliano & Montanari, Angela & Viroli, Cinzia, 2009. "Penalized factor mixture analysis for variable selection in clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4301-4310, October.
    16. Laura Anderlucci & Francesca Fortunato & Angela Montanari, 2022. "High-Dimensional Clustering via Random Projections," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 191-216, March.
    17. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    18. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    19. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    20. repec:jss:jstsof:18:i06 is not listed on IDEAS
    21. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:16:y:2022:i:2:d:10.1007_s11634-021-00488-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.