IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v38y2021i2d10.1007_s00357-020-09377-y.html
   My bibliography  Save this article

Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints

Author

Listed:
  • Nathanaël Randriamihamison

    (INRAE, UR875 Mathématiques et Informatique Appliquées Toulouse
    Université de Toulouse, CNRS UPS)

  • Nathalie Vialaneix

    (INRAE, UR875 Mathématiques et Informatique Appliquées Toulouse)

  • Pierre Neuvial

    (Université de Toulouse, CNRS UPS)

Abstract

Hierarchical agglomerative clustering (HAC) with Ward’s linkage has been widely used since its introduction by Ward (Journal of the American Statistical Association, 58(301), 236–244, 1963). This article reviews extensions of HAC to various input data and contiguity-constrained HAC, and provides applicability conditions. In addition, different versions of the graphical representation of the results as a dendrogram are also presented and their properties are clarified. We clarify and complete the results already available in an heterogeneous literature using a uniform background. In particular, this study reveals an important distinction between a consistency property of the dendrogram and the absence of crossover within it. Finally, a simulation study shows that the constrained version of HAC can sometimes provide more relevant results than its unconstrained version despite the fact that the constraint leads to optimize the objective criterion on a reduced set of solutions at each step. Overall, this article provides comprehensive recommendations, both for the use of HAC and constrained HAC depending on the input data, and for the representation of the results.

Suggested Citation

  • Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.
  • Handle: RePEc:spr:jclass:v:38:y:2021:i:2:d:10.1007_s00357-020-09377-y
    DOI: 10.1007/s00357-020-09377-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-020-09377-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-020-09377-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gordon, A. D., 1996. "A survey of constrained classification," Computational Statistics & Data Analysis, Elsevier, vol. 21(1), pages 17-29, January.
    2. Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.
    3. Anuška Ferligoj & Vladimir Batagelj, 1982. "Clustering with relational constraint," Psychometrika, Springer;The Psychometric Society, vol. 47(4), pages 413-426, December.
    4. Gale Young & A. Householder, 1938. "Discussion of a set of points in terms of their mutual distances," Psychometrika, Springer;The Psychometric Society, vol. 3(1), pages 19-22, March.
    5. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    6. Jesse R. Dixon & Siddarth Selvaraj & Feng Yue & Audrey Kim & Yan Li & Yin Shen & Ming Hu & Jun S. Liu & Bing Ren, 2012. "Topological domains in mammalian genomes identified by analysis of chromatin interactions," Nature, Nature, vol. 485(7398), pages 376-380, May.
    7. Vladimir Batagelj, 1981. "Note on ultrametric hierarchical clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 46(3), pages 351-352, September.
    8. J. Kruskal, 1964. "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, Springer;The Psychometric Society, vol. 29(1), pages 1-27, March.
    9. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    10. Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.
    11. Nathan Krislock & Henry Wolkowicz, 2012. "Euclidean Distance Matrices and Applications," International Series in Operations Research & Management Science, in: Miguel F. Anjos & Jean B. Lasserre (ed.), Handbook on Semidefinite, Conic and Polynomial Optimization, chapter 0, pages 879-914, Springer.
    12. Douglas Steinley & Lawrence Hubert, 2008. "Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal," Psychometrika, Springer;The Psychometric Society, vol. 73(4), pages 647-664, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aleksandra Badora & Krzysztof Kud & Marian Woźniak, 2022. "Consumer Attitudes as Part of Lifestyle in the COVID-19 Emergency," Sustainability, MDPI, vol. 14(15), pages 1-20, August.
    2. Mikhail Krivko & Luboš Smutka, 2021. "Agricultural and Foodstuff Trade between EU28 and Russia: (Non)Uniformity of the Russian Import Ban Impact Distribution," Agriculture, MDPI, vol. 11(12), pages 1-15, December.
    3. Sylwia Pangsy-Kania & Anna Golejewska & Katarzyna Wierzbicka & Magdalena Mosionek-Schweda, 2023. "Searching for Dependencies between Business Strategies and Innovation Outputs in Manufacturing: An Analysis Based on CIS," Sustainability, MDPI, vol. 15(9), pages 1-13, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan Carlos Duque & Raúl Ramos & Jordi Suriñach, 2007. "Supervised Regionalization Methods: A Survey," International Regional Science Review, , vol. 30(3), pages 195-220, July.
    2. Michael Rennings & Philipp Baaden & Carolin Block & Marcus John & Stefanie Bröring, 2024. "Assessing emerging sustainability-oriented technologies: the case of precision agriculture," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 2969-2998, June.
    3. Rui Fragoso & Conceição Rego & Vladimir Bushenkov, 2016. "Clustering of Territorial Areas: A Multi-Criteria Districting Problem," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 14(2), pages 179-198, December.
    4. Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
    5. Brault, Vincent & Ouadah, Sarah & Sansonnet, Laure & Lévy-Leduc, Céline, 2018. "Nonparametric multiple change-point estimation for analyzing large Hi-C data matrices," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 143-165.
    6. Mantas Svazas & Valentinas Navickas & Yuriy Bilan & Joanna Nakonieczny & Jana Spankova, 2021. "Biomass Clusterization from a Regional Perspective: The Case of Lithuania," Energies, MDPI, vol. 14(21), pages 1-15, October.
    7. William Day & Herbert Edelsbrunner, 1985. "Investigation of proportional link linkage clustering methods," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 239-254, December.
    8. Wayne DeSarbo & Vijay Mahajan, 1984. "Constrained classification: The use of a priori information in cluster analysis," Psychometrika, Springer;The Psychometric Society, vol. 49(2), pages 187-215, June.
    9. Shmuel Sattath & Amos Tversky, 1977. "Additive similarity trees," Psychometrika, Springer;The Psychometric Society, vol. 42(3), pages 319-345, September.
    10. Marcin Bukowski & Janusz Majewski & Agnieszka Sobolewska, 2023. "The Environmental Impact of Changes in the Structure of Electricity Sources in Europe," Energies, MDPI, vol. 16(1), pages 1-22, January.
    11. Hyunseok Park & Janghyeok Yoon & Kwangsoo Kim, 2012. "Identifying patent infringement using SAO based semantic technological similarities," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 515-529, February.
    12. John Daws, 1996. "The analysis of free-sorting data: Beyond pairwise cooccurrences," Journal of Classification, Springer;The Classification Society, vol. 13(1), pages 57-80, March.
    13. Giuseppe Bove & Akinori Okada, 2018. "Methods for the analysis of asymmetric pairwise relationships," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(1), pages 5-31, March.
    14. Tsai, Cary Chi-Liang & Cheng, Echo Sihan, 2021. "Incorporating statistical clustering methods into mortality models to improve forecasting performances," Insurance: Mathematics and Economics, Elsevier, vol. 99(C), pages 42-62.
    15. Juan Carlos Duque & Raúl Ramos, 2004. "Design of homogenous territorial units: a methodological proposal," ERSA conference papers ersa04p6, European Regional Science Association.
    16. Ritu Arora, 2023. "Intellectual Structure of Parenting Style Research: A Bibliometric Analysis," SAGE Open, , vol. 13(2), pages 21582440231, April.
    17. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    18. Si-Tong Lu & Miao Zhang & Qing-Na Li, 2020. "Feasibility and a fast algorithm for Euclidean distance matrix optimization with ordinal constraints," Computational Optimization and Applications, Springer, vol. 76(2), pages 535-569, June.
    19. Richard C. Roistacher, 1974. "A Review of Mathematical Methods in Sociometry," Sociological Methods & Research, , vol. 3(2), pages 123-171, November.
    20. Simon Blanchard & Wayne DeSarbo & A. Atalay & Nukhet Harmancioglu, 2012. "Identifying consumer heterogeneity in unobserved categories," Marketing Letters, Springer, vol. 23(1), pages 177-194, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:38:y:2021:i:2:d:10.1007_s00357-020-09377-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.