IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v36y2019i1d10.1007_s00357-019-09317-5.html
   My bibliography  Save this article

Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering

Author

Listed:
  • Zdeněk Šulc

    (University of Economics, Prague)

  • Hana Řezanková

    (University of Economics, Prague)

Abstract

This paper deals with similarity measures for categorical data in hierarchical clustering, which can deal with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures consider additional characteristics of a dataset, such as a frequency distribution of categories or the number of categories of a given variable. The paper recognizes two main aims. First, to compare and evaluate the selected similarity measures regarding the quality of produced clusters in hierarchical clustering. Second, to propose new similarity measures for nominal variables. All the examined similarity measures are compared regarding the quality of the produced clusters using the mean ranked scores of two internal evaluation coefficients. The analysis is performed on the generated datasets, and thus, it allows determining in which particular situations a certain similarity measure is recommended for use.

Suggested Citation

  • Zdeněk Šulc & Hana Řezanková, 2019. "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 58-72, April.
  • Handle: RePEc:spr:jclass:v:36:y:2019:i:1:d:10.1007_s00357-019-09317-5
    DOI: 10.1007/s00357-019-09317-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-019-09317-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-019-09317-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Isabella Morlini & Sergio Zani, 2012. "A New Class of Weighted Similarity Indices Using Polytomous Variables," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 199-226, July.
    2. Matthijs J. Warrens, 2016. "Inequalities Between Similarities for Numerical Data," Journal of Classification, Springer;The Classification Society, vol. 33(1), pages 141-148, April.
    3. Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Luboš Smutka & Josef Abrhám, 2022. "The impact of the Russian import ban on EU agrarian exports," Agricultural Economics, Czech Academy of Agricultural Sciences, vol. 68(2), pages 39-49.
    2. Dario Krpan & Jonathan E. Booth & Andreea Damien, 2023. "The positive–negative–competence (PNC) model of psychological responses to representations of robots," Nature Human Behaviour, Nature, vol. 7(11), pages 1933-1954, November.
    3. Huynh Evertsen, Phuc & Rasmussen, Einar & Nenadic, Oleg, 2022. "Commercializing circular economy innovations: A taxonomy of academic spin-offs," Technological Forecasting and Social Change, Elsevier, vol. 185(C).
    4. Zdenek Sulc & Jana Cibulkova & Hana Rezankova, 2022. "Nomclust 2.0: an R package for hierarchical clustering of objects characterized by nominal variables," Computational Statistics, Springer, vol. 37(5), pages 2161-2184, November.
    5. Ruben Tessmann & Ralf Elbert, 2022. "Multi-sided platforms in competitive B2B networks with varying governmental influence – a taxonomy of Port and Cargo Community System business models," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 829-872, June.
    6. Malkamäki, Arttu & Korhonen, Jaana E. & Berghäll, Sami & Berg Rustas, Carolina & Bernö, Hanna & Carreira, Ariane & D'Amato, Dalia & Dobrovolsky, Alexander & Giertliová, Blanka & Holmgren, Sara & Mark-, 2022. "Public perceptions of using forests to fuel the European bioeconomy: Findings from eight university cities," Forest Policy and Economics, Elsevier, vol. 140(C).
    7. Ana Perišić & Marko Pahor, 2023. "Clustering mixed-type player behavior data for churn prediction in mobile games," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 31(1), pages 165-190, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    2. Iwona Bąk & Anna Barwińska-Małajowicz & Grażyna Wolska & Paweł Walawender & Paweł Hydzik, 2021. "Is the European Union Making Progress on Energy Decarbonisation While Moving towards Sustainable Development?," Energies, MDPI, vol. 14(13), pages 1-18, June.
    3. Laurin Arnold & Jan Jöhnk & Florian Vogt & Nils Urbach, 2022. "IIoT platforms’ architectural features – a taxonomy and five prevalent archetypes," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 927-944, June.
    4. Zdenek Sulc & Jana Cibulkova & Hana Rezankova, 2022. "Nomclust 2.0: an R package for hierarchical clustering of objects characterized by nominal variables," Computational Statistics, Springer, vol. 37(5), pages 2161-2184, November.
    5. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    6. Barbara E. Marschallek & Thomas Jacobsen, 2022. "Smooth and Hard or Beautiful and Elegant? Experts’ Conceptual Structure of the Aesthetics of Materials," SAGE Open, , vol. 12(2), pages 21582440221, May.
    7. Schnettler, Berta & Grunert, Klaus G. & Lobos, Germán & Miranda-Zapata, Edgardo & Denegri, Marianela & Lapo, María & Hueche, Clementina & Rojas, Juan, 2019. "Maternal well-being, food involvement and quality of diet: Profiles of single mother-adolescent dyads," Children and Youth Services Review, Elsevier, vol. 96(C), pages 336-345.
    8. Boris Mirkin & Trevor I. Fenner, 2019. "Distance and Consensus for Preference Relations Corresponding to Ordered Partitions," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 350-367, July.
    9. Trotta, Gianluca, 2020. "An empirical analysis of domestic electricity load profiles: Who consumes how much and when?," Applied Energy, Elsevier, vol. 275(C).
    10. Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:36:y:2019:i:1:d:10.1007_s00357-019-09317-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.