IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v32y2015i1p46-62.html
   My bibliography  Save this article

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

Author

Listed:
  • Renato Amorim

Abstract

In this paper we introduce a new hierarchical clustering algorithm called Ward p . Unlike the original Ward, Ward p generates feature weights, which can be seen as feature rescaling factors thanks to the use of the L p norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters. We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly random noise. Our experiments show that: (i) the use of our feature weighting method produces results that are superior to those produced by the original Ward method on datasets containing noise features; (ii) it is indeed possible to estimate a good exponent p under a totally unsupervised framework. The clusterings produced by Ward p are dependent on p. This makes the estimation of a good value for this exponent a requirement for this algorithm, and indeed for any other also based on the L p norm. Copyright Classification Society of North America 2015

Suggested Citation

  • Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
  • Handle: RePEc:spr:jclass:v:32:y:2015:i:1:p:46-62
    DOI: 10.1007/s00357-015-9167-1
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-015-9167-1
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-015-9167-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wayne DeSarbo & J. Carroll & Linda Clark & Paul Green, 1984. "Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables," Psychometrika, Springer;The Psychometric Society, vol. 49(1), pages 57-78, March.
    2. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    3. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    4. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    5. Geert Soete, 1986. "Optimal variable weighting for ultrametric and additive tree clustering," Quality & Quantity: International Journal of Methodology, Springer, vol. 20(2), pages 169-180, June.
    6. Geert Soete, 1988. "OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting," Journal of Classification, Springer;The Classification Society, vol. 5(1), pages 101-104, March.
    7. Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Douglas L. Steinley, 2016. "Editorial," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 167-170, July.
    2. Torija, Antonio J. & Self, Rod H., 2018. "Aircraft classification for efficient modelling of environmental noise impact of aviation," Journal of Air Transport Management, Elsevier, vol. 67(C), pages 157-168.
    3. Pavel S. Stashevsky & Irina N. Yakovina & Tania M. Alarcon Falconi & Elena N. Naumova, 2019. "Agglomerative Clustering of Enteric Infections and Weather Parameters to Identify Seasonal Outbreaks in Cold Climates," IJERPH, MDPI, vol. 16(12), pages 1-19, June.
    4. Stanisław Gruszczyński & Wojciech Gruszczyński, 2022. "Assessing the Information Potential of MIR Spectral Signatures for Prediction of Multiple Soil Properties Based on Data from the AfSIS Phase I Project," IJERPH, MDPI, vol. 19(22), pages 1-22, November.
    5. Kukulska-Kozieł, Anita & Szylar, Marta & Cegielska, Katarzyna & Noszczyk, Tomasz & Hernik, Józef & Gawroński, Krzysztof & Dixon-Gough, Robert & Jombach, Sándor & Valánszki, István & Filepné Kovács, Kr, 2019. "Towards three decades of spatial development transformation in two contrasting post-Soviet cities—Kraków and Budapest," Land Use Policy, Elsevier, vol. 85(C), pages 328-339.
    6. Maarten M. Kampert & Jacqueline J. Meulman & Jerome H. Friedman, 2017. "rCOSA: A Software Package for Clustering Objects on Subsets of Attributes," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 514-547, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
    2. Tsai, Chieh-Yuan & Chiu, Chuang-Cheng, 2008. "Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4658-4672, June.
    3. Michael Brusco & J. Cradit, 2001. "A variable-selection heuristic for K-means clustering," Psychometrika, Springer;The Psychometric Society, vol. 66(2), pages 249-270, June.
    4. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    5. Susan Brudvig & Michael J. Brusco & J. Dennis Cradit, 2019. "Joint selection of variables and clusters: recovering the underlying structure of marketing data," Journal of Marketing Analytics, Palgrave Macmillan, vol. 7(1), pages 1-12, March.
    6. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    7. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    8. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    9. Mantas Svazas & Valentinas Navickas & Yuriy Bilan & Joanna Nakonieczny & Jana Spankova, 2021. "Biomass Clusterization from a Regional Perspective: The Case of Lithuania," Energies, MDPI, vol. 14(21), pages 1-15, October.
    10. Yaling Deng & Shuliang Zou & Daming You, 2018. "Theoretical Guidance on Evacuation Decisions after a Big Nuclear Accident under the Assumption That Evacuation Is Desirable," Sustainability, MDPI, vol. 10(9), pages 1-14, August.
    11. Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.
    12. Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
    13. Aleša Lotrič Dolinar & Jože Sambt & Simona Korenjak-Černe, 2019. "Clustering EU Countries by Causes of Death," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 38(2), pages 157-172, April.
    14. R. Gnanadesikan & J. Kettenring & S. Tsao, 1995. "Weighting and selection of variables for cluster analysis," Journal of Classification, Springer;The Classification Society, vol. 12(1), pages 113-136, March.
    15. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    16. Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
    17. von Borries, George & Wang, Haiyan, 2009. "Partition clustering of high dimensional low sample size data based on p-values," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3987-3998, October.
    18. Pavel I. Blus & Rustam V. Plotnikov, 2022. "Spatial clustering for reducing intraregional unevenness," Journal of New Economy, Ural State University of Economics, vol. 23(1), pages 88-108, April.
    19. Nazila Zarghi, 2021. "Evidence-Based Social Sciences: A New Emerging Field," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, January -.
    20. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:32:y:2015:i:1:p:46-62. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.