IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i3p711-723.html
   My bibliography  Save this article

Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models

Author

Listed:
  • McNicholas, P.D.
  • Murphy, T.B.
  • McDaid, A.F.
  • Frost, D.

Abstract

Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis like covariance structure, is described and an efficient algorithm for its implementation is presented. This algorithm uses the alternating expectation-conditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this family of models, namely model selection and convergence criteria, are discussed. These central issues also have implications for other model-based clustering techniques and for the implementation of techniques like the EM algorithm, in general. The Bayesian information criterion (BIC) is used for model selection and Aitken's acceleration, which is shown to outperform the lack of progress criterion, is used to determine convergence. A brief introduction to parallel computing is then given before the implementation of this algorithm in parallel is facilitated within the master-slave paradigm. A simulation study is then carried out to confirm the effectiveness of this parallelization. The resulting software is applied to two datasets to demonstrate its effectiveness when compared to existing software.

Suggested Citation

  • McNicholas, P.D. & Murphy, T.B. & McDaid, A.F. & Frost, D., 2010. "Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 711-723, March.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:3:p:711-723
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00063-2
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gatu, Cristian & Yanev, Petko I. & Kontoghiorghes, Erricos J., 2007. "A graph approach to generate all possible regression submodels," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 799-815, October.
    2. Chris Fraley & Adrian E. Raftery, 2003. "Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST," Journal of Classification, Springer;The Classification Society, vol. 20(2), pages 263-286, September.
    3. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    4. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    5. Racine, Jeff, 2002. "Parallel distributed kernel estimation," Computational Statistics & Data Analysis, Elsevier, vol. 40(2), pages 293-302, August.
    6. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    7. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    2. Galimberti, Giuliano & Montanari, Angela & Viroli, Cinzia, 2009. "Penalized factor mixture analysis for variable selection in clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4301-4310, October.
    3. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    4. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    5. Wang, Wan-Lun, 2015. "Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 223-235.
    6. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    7. repec:jss:jstsof:18:i06 is not listed on IDEAS
    8. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    9. Morris, Katherine & McNicholas, Paul D., 2016. "Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 133-150.
    10. Alex Sharp & Glen Chalatov & Ryan P. Browne, 2023. "A dual subspace parsimonious mixture of matrix normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 801-822, September.
    11. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    12. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    13. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    14. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    15. Tyler Roick & Dimitris Karlis & Paul D. McNicholas, 2021. "Clustering discrete-valued time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 209-229, March.
    16. Tang, Yang & Browne, Ryan P. & McNicholas, Paul D., 2015. "Model based clustering of high-dimensional binary data," Computational Statistics & Data Analysis, Elsevier, vol. 87(C), pages 84-101.
    17. Charles Bouveyron & Camille Brunet-Saumard, 2014. "Discriminative variable selection for clustering with the sparse Fisher-EM algorithm," Computational Statistics, Springer, vol. 29(3), pages 489-513, June.
    18. Ryan P. Browne & Luca Bagnato & Antonio Punzo, 2024. "Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 597-625, September.
    19. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    20. Nam-Hwui Kim & Ryan Browne, 2019. "Subspace clustering for the finite mixture of generalized hyperbolic distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 641-661, September.
    21. Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:3:p:711-723. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.