IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v41y2016i2p205-225.html
   My bibliography  Save this article

A Survey of Popular R Packages for Cluster Analysis

Author

Listed:
  • Abby Flynt

    (Bucknell University)

  • Nema Dean

    (University of Glasgow)

Abstract

Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans , and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data, or a collection of the two. The contrasting methods in the different packages are briefly introduced, and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/∼nd29c/Software/ClusterReviewCode.R

Suggested Citation

  • Abby Flynt & Nema Dean, 2016. "A Survey of Popular R Packages for Cluster Analysis," Journal of Educational and Behavioral Statistics, , vol. 41(2), pages 205-225, April.
  • Handle: RePEc:sae:jedbes:v:41:y:2016:i:2:p:205-225
    DOI: 10.3102/1076998616631743
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/1076998616631743
    Download Restriction: no

    File URL: https://libkey.io/10.3102/1076998616631743?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Azzalini, Adelchi & Menardi, Giovanna, 2014. "Clustering via Nonparametric Density Estimation: The R Package pdfCluster," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i11).
    2. Linzer, Drew A. & Lewis, Jeffrey B., 2011. "poLCA: An R Package for Polytomous Variable Latent Class Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i10).
    3. Nema Dean & Rebecca Nugent, 2013. "Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 339-357, September.
    4. Christian Hennig & Tim F. Liao, 2013. "How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(3), pages 309-369, May.
    5. Grün, Bettina & Leisch, Friedrich, 2008. "FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i04).
    6. Fraley C. & Raftery A.E., 2002. "Model-Based Clustering, Discriminant Analysis, and Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 611-631, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hasan Ture & Seyyide Dogan & Deniz Kocak, 2019. "Assessing Euro 2020 Strategy Using Multi-criteria Decision Making Methods: VIKOR and TOPSIS," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 142(2), pages 645-665, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hui Ye & Anthony Bellotti, 2019. "Modelling Recovery Rates for Non-Performing Loans," Risks, MDPI, vol. 7(1), pages 1-17, February.
    2. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "Erratum to: The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 327-355, July.
    3. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    4. Ewa Genge, 2014. "A latent class analysis of the public attitude towards the euro adoption in Poland," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(4), pages 427-442, December.
    5. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 85-113, April.
    6. Cristina Bernini & Maria Francesca Cracolici & Cinzia Viroli, 2017. "Does Tourism Consumption Behaviour Mirror Differences in Living Standards?," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 134(3), pages 1157-1171, December.
    7. Bettina Grün & Sara Dolnicar, 2016. "Response style corrected market segmentation for ordinal data," Marketing Letters, Springer, vol. 27(4), pages 729-741, December.
    8. Adelchi Azzalini & Giovanna Menardi, 2016. "Density-based clustering with non-continuous data," Computational Statistics, Springer, vol. 31(2), pages 771-798, June.
    9. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    10. repec:jss:jstsof:42:i10 is not listed on IDEAS
    11. Adrian O’Hagan & Arthur White, 2019. "Improved model-based clustering performance using Bayesian initialization averaging," Computational Statistics, Springer, vol. 34(1), pages 201-231, March.
    12. Pourahmadi, Mohsen & Daniels, Michael J. & Park, Trevor, 2007. "Simultaneous modelling of the Cholesky decomposition of several covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 568-587, March.
    13. Christian Kleiber & Achim Zeileis, 2016. "Visualizing Count Data Regressions Using Rootograms," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 296-303, July.
    14. Lebret, Rémi & Iovleff, Serge & Langrognet, Florent & Biernacki, Christophe & Celeux, Gilles & Govaert, Gérard, 2015. "Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i06).
    15. Lisa Blaydes, 2023. "Assessing the Labor Conditions of Migrant Domestic Workers in the Arab Gulf States," ILR Review, Cornell University, ILR School, vol. 76(4), pages 724-747, August.
    16. Grün, Bettina & Kosmidis, Ioannis & Zeileis, Achim, 2012. "Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i11).
    17. Jindřich Špička & Zdeňka Náglová, 2022. "Consumer segmentation in the meat market - The case study of Czech Republic," Agricultural Economics, Czech Academy of Agricultural Sciences, vol. 68(2), pages 68-77.
    18. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    19. Nicholas T. Davis & Kirby Goidel & Yikai Zhao, 2021. "The Meanings of Democracy among Mass Publics," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 153(3), pages 849-921, February.
    20. Elvira Pelle & Roberta Pappadà, 2021. "A clustering procedure for mixed-type data to explore ego network typologies: an application to elderly people living alone in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(5), pages 1507-1533, December.
    21. Marc A. Scott & Kaushik Mohan & Jacques‐Antoine Gauthier, 2020. "Model‐based clustering and analysis of life history data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1231-1251, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:41:y:2016:i:2:p:205-225. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.