IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v32y2015i2p241-267.html
   My bibliography  Save this article

TOBAE: A Density-based Agglomerative Clustering Algorithm

Author

Listed:
  • Shehzad Khalid
  • Shahid Razzaq

Abstract

This paper presents a novel density based agglomerative clustering algorithm named TOBAE which is a parameter-less algorithm and automatically filters noise. It finds the appropriate number of clusters while giving a competitive running time. TOBAE works by tracking the cumulative density distribution of the data points on a grid and only requires the original data set as input. The clustering problem is solved by automatically finding the optimal density threshold for the clusters. It is applicable to any N-dimensional data set which makes it highly relevant for real world scenarios. The algorithm outperforms state of the art clustering algorithms by the additional feature of automatic noise filtration around clusters. The concept behind the algorithm is explained using the analogy of puddles (’tobae’), which the algorithm is inspired from. This paper provides a detailed algorithm for TOBAE along with the complexity analysis for both time and space. We show experimental results against known data sets and show how TOBAE competes with the best algorithms in the field while providing its own set of advantages. Copyright Classification Society of North America 2015

Suggested Citation

  • Shehzad Khalid & Shahid Razzaq, 2015. "TOBAE: A Density-based Agglomerative Clustering Algorithm," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 241-267, July.
  • Handle: RePEc:spr:jclass:v:32:y:2015:i:2:p:241-267
    DOI: 10.1007/s00357-015-9166-2
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-015-9166-2
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-015-9166-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. C. Abraham & P. A. Cornillon & E. Matzner‐Løber & N. Molinari, 2003. "Unsupervised Curve Clustering using B‐Splines," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(3), pages 581-595, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    2. C. Abraham & G. Biau & B. Cadre, 2006. "On the Kernel Rule for Function Classification," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 58(3), pages 619-633, September.
    3. repec:cte:wsrepe:ws140101 is not listed on IDEAS
    4. Philip A. White & Alan E. Gelfand, 2021. "Multivariate functional data modeling with time-varying clustering," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 586-602, September.
    5. Shuichi Tokushige & Hiroshi Yadohisa & Koichi Inada, 2007. "Crisp and fuzzy k-means clustering algorithms for multivariate functional data," Computational Statistics, Springer, vol. 22(1), pages 1-16, April.
    6. Andrés Alonso & David Casado & Sara López-Pintado & Juan Romo, 2014. "Robust Functional Supervised Classification for Time Series," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 325-350, October.
    7. Mitsunori Kayano & Koji Dozono & Sadanori Konishi, 2010. "Functional Cluster Analysis via Orthonormalized Gaussian Basis Expansions and Its Application," Journal of Classification, Springer;The Classification Society, vol. 27(2), pages 211-230, September.
    8. Fang, Kuangnan & Chen, Yuanxing & Ma, Shuangge & Zhang, Qingzhao, 2022. "Biclustering analysis of functionals via penalized fusion," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    9. Michio Yamamoto, 2012. "Clustering of functional data in a low-dimensional subspace," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(3), pages 219-247, October.
    10. Ferraty, F. & Vieu, P., 2003. "Curves discrimination: a nonparametric functional approach," Computational Statistics & Data Analysis, Elsevier, vol. 44(1-2), pages 161-173, October.
    11. Ja‐Yoon Jang & Hee‐Seok Oh & Yaeji Lim & Ying Kuen Cheung, 2021. "Ensemble clustering for step data via binning," Biometrics, The International Biometric Society, vol. 77(1), pages 293-304, March.
    12. Tin Lok James Ng & Thomas Brendan Murphy, 2021. "Model-based Clustering of Count Processes," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 188-211, July.
    13. C. Denis & E. Lebarbier & C. Lévy‐Leduc & O. Martin & L. Sansonnet, 2020. "A novel regularized approach for functional data clustering: an application to milking kinetics in dairy goats," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(3), pages 623-640, June.
    14. Vogt, Michael & Linton, Oliver, 2020. "Multiscale clustering of nonparametric regression curves," Journal of Econometrics, Elsevier, vol. 216(1), pages 305-325.
    15. Alonso, Andrés M. & Casado, David & Romo, Juan, 2012. "Supervised classification for functional data: A weighted distance approach," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2334-2346.
    16. López Pintado, Sara, 2005. "Depth-based classification for functional data," DES - Working Papers. Statistics and Econometrics. WS ws055611, Universidad Carlos III de Madrid. Departamento de Estadística.
    17. Julien Jacques & Cristian Preda, 2014. "Functional data clustering: a survey," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(3), pages 231-255, September.
    18. Prieto, Francisco J. & Rendón, Carolina, 2014. "Independent components techniques based on kurtosis for functional data analysis," DES - Working Papers. Statistics and Econometrics. WS ws141006, Universidad Carlos III de Madrid. Departamento de Estadística.
    19. Christophe Genolini & Bruno Falissard, 2010. "KmL: k-means for longitudinal data," Computational Statistics, Springer, vol. 25(2), pages 317-328, June.
    20. Maria Ruiz-Medina & Rosa Espejo & Elvira Romano, 2014. "Spatial functional normal mixed effect approach for curve classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(3), pages 257-285, September.
    21. Adriano Zanin Zambom & Julian A. A. Collazos & Ronaldo Dias, 2019. "Functional data clustering via hypothesis testing k-means," Computational Statistics, Springer, vol. 34(2), pages 527-549, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:32:y:2015:i:2:p:241-267. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.