IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v585y2022ics0378437121007068.html
   My bibliography  Save this article

Revisiting agglomerative clustering

Author

Listed:
  • Tokuda, Eric K.
  • Comin, Cesar H.
  • Costa, Luciano da F.

Abstract

Hierarchical agglomerative methods stand out as particularly effective and popular approaches for clustering data. Yet, these methods have not been systematically compared regarding the important issue of false positives while searching for clusters. A model of clusters involving a higher density nucleus surrounded by a transition, followed by outliers is adopted as a means to quantify the relevance of the obtained clusters and address the problem of false positives. Six traditional methodologies, namely the single, average, median, complete, centroid and Ward’s linkage criteria are compared with respect to the adopted model. Unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions are considered for this comparison. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus.

Suggested Citation

  • Tokuda, Eric K. & Comin, Cesar H. & Costa, Luciano da F., 2022. "Revisiting agglomerative clustering," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 585(C).
  • Handle: RePEc:eee:phsmap:v:585:y:2022:i:c:s0378437121007068
    DOI: 10.1016/j.physa.2021.126433
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437121007068
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2021.126433?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. C. Glasbey, 1987. "Complete linkage as a multiple stopping rule for single linkage clustering," Journal of Classification, Springer;The Classification Society, vol. 4(1), pages 103-109, March.
    2. Franke, R., 2016. "CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 384-408.
    3. Zeitsch, Peter J., 2019. "A jump model for credit default swaps with hierarchical clustering," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 524(C), pages 737-775.
    4. J. C. Gower & G. J. S. Ross, 1969. "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 18(1), pages 54-64, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Max Olinto Moreira & Betania Mafra Kaizer & Takaaki Ohishi & Benedito Donizeti Bonatto & Antonio Carlos Zambroni de Souza & Pedro Paulo Balestrassi, 2022. "Multivariate Strategy Using Artificial Neural Networks for Seasonal Photovoltaic Generation Forecasting," Energies, MDPI, vol. 16(1), pages 1-30, December.
    2. Ifaei, Pouya & Nazari-Heris, Morteza & Tayerani Charmchi, Amir Saman & Asadi, Somayeh & Yoo, ChangKyoo, 2023. "Sustainable energies and machine learning: An organized review of recent applications and challenges," Energy, Elsevier, vol. 266(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    2. Wang, Yuanrong & Aste, Tomaso, 2023. "Dynamic portfolio optimization with inverse covariance clustering," LSE Research Online Documents on Economics 117701, London School of Economics and Political Science, LSE Library.
    3. Sung-Soo Kim & W. Krzanowski, 2007. "Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization," Computational Statistics, Springer, vol. 22(1), pages 109-119, April.
    4. Kirschstein, Thomas & Liebscher, Steffen & Becker, Claudia, 2013. "Robust estimation of location and scatter by pruning the minimum spanning tree," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 173-184.
    5. Jean-Pierre Barthélemy & Bruno Leclerc & Bernard Monjardet, 1986. "On the use of ordered sets in problems of comparison and consensus of classifications," Journal of Classification, Springer;The Classification Society, vol. 3(2), pages 187-224, September.
    6. Sergio Scippacercola, 2011. "The Factorial Minimum Spanning Tree as a Reference for a Synthetic Index of Complex Phenomena," Journal of Classification, Springer;The Classification Society, vol. 28(1), pages 21-37, April.
    7. Lawrence Hubert, 1974. "Some applications of graph theory to clustering," Psychometrika, Springer;The Psychometric Society, vol. 39(3), pages 283-309, September.
    8. Bruno Leclerc, 1995. "Minimum spanning trees for tree metrics: abridgements and adjustments," Journal of Classification, Springer;The Classification Society, vol. 12(2), pages 207-241, September.
    9. Zhimei Lei & Kuo-Jui Wu & Li Cui & Ming K Lim, 2018. "A Hybrid Approach to Explore the Risk Dependency Structure among Agribusiness Firms," Sustainability, MDPI, vol. 10(2), pages 1-17, February.
    10. Raymond, Ben & Hosie, Graham, 2009. "Network-based exploration and visualisation of ecological data," Ecological Modelling, Elsevier, vol. 220(5), pages 673-683.
    11. Zhang, Yanyun & Xue, Peng & Zhao, Yifan & Zhang, Qianqian & Bai, Gongxun & Peng, Jinqing & Li, Bojia, 2024. "Spectra measurement and clustering analysis of global horizontal irradiance for solar energy application," Renewable Energy, Elsevier, vol. 222(C).
    12. Unknown, 1996. "Proceedings of a workshop held at Northern Territory University, 6-7 June 1996: Trochus: Status, Hatchery Practice and Nutrition," ACIAR Proceedings Series 135188, Australian Centre for International Agricultural Research.
    13. Eden, Colin, 2004. "Analyzing cognitive maps to help structure issues or problems," European Journal of Operational Research, Elsevier, vol. 159(3), pages 673-686, December.
    14. Yuanrong Wang & Tomaso Aste, 2021. "Dynamic Portfolio Optimization with Inverse Covariance Clustering," Papers 2112.15499, arXiv.org, revised Jan 2022.
    15. Karimi-Majd, Amir-Mohsen & Fathian, Mohammad & Makrehchi, Masoud, 2018. "Consensus-based methodology for detection communities in multilayered networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 494(C), pages 547-558.
    16. Massimiliano Fessina & Giambattista Albora & Andrea Tacchella & Andrea Zaccaria, 2022. "Which products activate a product? An explainable machine learning approach," Papers 2212.03094, arXiv.org.
    17. Andrea Di Iura, 2022. "Comparison of empirical and shrinkage correlation algorithm for clustering methods in the futures market," SN Business & Economics, Springer, vol. 2(8), pages 1-17, August.
    18. Scott Emmons & Stephen Kobourov & Mike Gallant & Katy Börner, 2016. "Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-18, July.
    19. Ahuja, Ravindra K., 1956-, 1992. "Applications of network optimization," Working papers 3458-92., Massachusetts Institute of Technology (MIT), Sloan School of Management.
    20. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:585:y:2022:i:c:s0378437121007068. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.