IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0168288.html
   My bibliography  Save this article

Generalising Ward’s Method for Use with Manhattan Distances

Author

Listed:
  • Trudie Strauss
  • Michael Johan von Maltitz

Abstract

The claim that Ward’s linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward’s clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward’s linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward’s method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward’s algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.

Suggested Citation

  • Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.
  • Handle: RePEc:plo:pone00:0168288
    DOI: 10.1371/journal.pone.0168288
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0168288
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0168288&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0168288?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brock, Guy & Pihur, Vasyl & Datta, Susmita & Datta, Somnath, 2008. "clValid: An R Package for Cluster Validation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i04).
    2. Zhenmin Chen & John Ness, 1996. "Space-conserving agglomerative algorithms," Journal of Classification, Springer;The Classification Society, vol. 13(1), pages 157-168, March.
    3. Nancy C. M. Ross & Dietmar Wolfram, 2000. "End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(10), pages 949-958.
    4. Glenn Milligan, 1979. "Ultrametric hierarchical clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 44(3), pages 343-346, September.
    5. Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Schnettler, Berta & Grunert, Klaus G. & Lobos, Germán & Miranda-Zapata, Edgardo & Denegri, Marianela & Lapo, María & Hueche, Clementina & Rojas, Juan, 2019. "Maternal well-being, food involvement and quality of diet: Profiles of single mother-adolescent dyads," Children and Youth Services Review, Elsevier, vol. 96(C), pages 336-345.
    2. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    3. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    4. Iwona Bąk & Anna Barwińska-Małajowicz & Grażyna Wolska & Paweł Walawender & Paweł Hydzik, 2021. "Is the European Union Making Progress on Energy Decarbonisation While Moving towards Sustainable Development?," Energies, MDPI, vol. 14(13), pages 1-18, June.
    5. Laurin Arnold & Jan Jöhnk & Florian Vogt & Nils Urbach, 2022. "IIoT platforms’ architectural features – a taxonomy and five prevalent archetypes," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 927-944, June.
    6. Trotta, Gianluca, 2020. "An empirical analysis of domestic electricity load profiles: Who consumes how much and when?," Applied Energy, Elsevier, vol. 275(C).
    7. Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.
    8. Zdeněk Šulc & Hana Řezanková, 2019. "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 58-72, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
    2. Pavel I. Blus & Rustam V. Plotnikov, 2022. "Spatial clustering for reducing intraregional unevenness," Journal of New Economy, Ural State University of Economics, vol. 23(1), pages 88-108, April.
    3. Patrick Zschech & Kai Heinrich & Raphael Bink & Janis S. Neufeld, 2019. "Prognostic Model Development with Missing Labels," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 61(3), pages 327-343, June.
    4. Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
    5. Gainbi Park & Zengwang Xu, 2022. "The constituent components and local indicator variables of social vulnerability index," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 95-120, January.
    6. Linde, Jona & Sonnemans, Joep & Tuinstra, Jan, 2014. "Strategies and evolution in the minority game: A multi-round strategy experiment," Games and Economic Behavior, Elsevier, vol. 86(C), pages 77-95.
    7. Gautier Marti & Frank Nielsen & Philippe Donnat & S'ebastien Andler, 2016. "On clustering financial time series: a need for distances between dependent random variables," Papers 1603.07822, arXiv.org.
    8. Zdeňka Náglová & Tereza Horáková, 2017. "Position of the Bakery Enterprises in the Czech Republic According to Detailed Specification of the Businesses," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 65(5), pages 1719-1727.
    9. Borke, Lukas & Härdle, Wolfgang Karl, 2016. "Q3-D3-Lsa," SFB 649 Discussion Papers 2016-049, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    10. Ana Alina Tudoran, 2022. "A machine learning approach to identifying decision-making styles for managing customer relationships," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(1), pages 351-374, March.
    11. Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
    12. Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
    13. Quessy, Jean-François, 2021. "A Szekely–Rizzo inequality for testing general copula homogeneity hypotheses," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    14. Carmen C. Rodríguez-Martínez & Mitzi Cubilla-Montilla & Purificación Vicente-Galindo & Purificación Galindo-Villardón, 2023. "X-STATIS: A Multivariate Approach to Characterize the Evolution of E-Participation, from a Global Perspective," Mathematics, MDPI, vol. 11(6), pages 1-15, March.
    15. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    16. Drago, Carlo & Fortuna, Fabio, 2023. "Investigating the Corporate Governance and Sustainability Relationship: A Bibliometric Analysis Using Keyword-Ensemble Community Detection," FEEM Working Papers 336985, Fondazione Eni Enrico Mattei (FEEM).
    17. Judit Bar-Ilan, 2001. "Data collection methods on the Web for infometric purposes — A review and analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 50(1), pages 7-32, January.
    18. Wu, Tong & Rocha, Juan C. & Berry, Kevin & Chaigneau, Tomas & Hamann, Maike & Lindkvist, Emilie & Qiu, Jiangxiao & Schill, Caroline & Shepon, Alon & Crépin, Anne-Sophie & Folke, Carl, 2024. "Triple Bottom Line or Trilemma? Global Tradeoffs Between Prosperity, Inequality, and the Environment," World Development, Elsevier, vol. 178(C).
    19. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    20. Zdeněk Hlávka & Marie Hušková & Simos G. Meintanis, 2020. "Change-point methods for multivariate time-series: paired vectorial observations," Statistical Papers, Springer, vol. 61(4), pages 1351-1383, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0168288. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.