Author
Abstract
Clustering methods are designed for finding groups in data, i.e., for grouping similar objects (variables or observations) into the same cluster and dissimilar objects into separate clusters. Although the main idea is rather simple, carrying out a cluster analysis remains a challenging task. The number of different clustering methods is huge and clustering includes many choices, such as the decision between basic approaches (e.g., hierarchical and partitioning methods), the choice of a dissimilarity or similarity measure, the selection of a particular linkage method when performing a hierarchical agglomerative cluster analysis, the choice of an initial partition when carrying out a partitioning cluster analysis, and the determination of the appropriate number of clusters. Each of these decisions can affect the classification results. Apart from two commands for determining the number of clusters (cluster stop, cluster dendrogram) Stata has no built-in tools that allow examination of clustering results. We therefore developed some simple tools that provide further evaluation criteria: * programs assisting in determining the number of clusters (Mojena’s stopping rules for hierarchical clustering techniques, PRE coefficient, F-Max statistic and Beale’s F values for a partitioning cluster analysis), * a program for testing the stability of classifications produced by different cluster analyses (Rand index), and * a program that computes ETA2 to assess how well the clustering variables separate the clusters. The presentation will compare these programs with other cluster-analysis tools (agglomeration schedule, scree diagram).
Suggested Citation
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:dsug06:08. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.