IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v18y2024i4d10.1007_s11634-023-00575-1.html
   My bibliography  Save this article

A fresh look at mean-shift based modal clustering

Author

Listed:
  • Jose Ameijeiras-Alonso

    (Universidade de Santiago de Compostela)

  • Jochen Einbeck

    (Durham University
    Durham University)

Abstract

Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift procedure, which is essentially an iteratively computed chain of local means. We revisit this technique, focusing on its link to kernel density gradient estimation, in this course proposing a novel concept for bandwidth selection based on the concept of a critical bandwidth. Furthermore, in the one-dimensional case, an inverse version of the mean shift is developed to provide a novel approach for the estimation of antimodes, which is then used to identify cluster boundaries. A simulation study is provided which assesses, in the univariate case, the classification accuracy of the mean-shift based clustering approach. Three (univariate and multivariate) examples from the fields of philately, engineering, and imaging, illustrate how modal clusterings identified through mean shift based methods relate directly and naturally to physical properties of the data-generating system. Solutions are proposed to deal computationally efficiently with large data sets.

Suggested Citation

  • Jose Ameijeiras-Alonso & Jochen Einbeck, 2024. "A fresh look at mean-shift based modal clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(4), pages 1067-1095, December.
  • Handle: RePEc:spr:advdac:v:18:y:2024:i:4:d:10.1007_s11634-023-00575-1
    DOI: 10.1007/s11634-023-00575-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-023-00575-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-023-00575-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Azzalini, Adelchi & Menardi, Giovanna, 2014. "Clustering via Nonparametric Density Estimation: The R Package pdfCluster," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i11).
    2. Christopher R. Genovese & Marco Perone-Pacifico & Isabella Verdinelli & Larry Wasserman, 2016. "Non-parametric inference for density modes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 99-126, January.
    3. Hennig, Christian & Christlieb, Norbert, 2002. "Validating visual clusters in large datasets: fixed point clusters of spectral features," Computational Statistics & Data Analysis, Elsevier, vol. 40(4), pages 723-739, October.
    4. Alessandro Casa & Luca Scrucca & Giovanna Menardi, 2021. "Better than the best? Answers via model ensemble in density-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 599-623, September.
    5. Duong, Tarn & Cowling, Arianna & Koch, Inge & Wand, M.P., 2008. "Feature significance for multivariate kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 52(9), pages 4225-4242, May.
    6. Giovanna Menardi, 2016. "A Review on Modal Clustering," International Statistical Review, International Statistical Institute, vol. 84(3), pages 413-433, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Federico Ferraccioli & Giovanna Menardi, 2023. "Modal clustering of matrix-variate data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 323-345, June.
    2. Alessandro Casa & Giovanna Menardi, 2022. "Nonparametric semi-supervised classification with application to signal detection in high energy physics," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 531-550, September.
    3. José E. Chacón, 2020. "The Modal Age of Statistics," International Statistical Review, International Statistical Institute, vol. 88(1), pages 122-141, April.
    4. Konstantin Eckle & Nicolai Bissantz & Holger Dette & Katharina Proksch & Sabrina Einecke, 2018. "Multiscale inference for a multivariate density with applications to X-ray astronomy," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(3), pages 647-689, June.
    5. Chacón, José E. & Fernández Serrano, Javier, 2024. "Bayesian taut splines for estimating the number of modes," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).
    6. Henderson, Daniel J. & Parmeter, Christopher F., 2012. "Normal reference bandwidths for the general order, multivariate kernel density derivative estimator," Statistics & Probability Letters, Elsevier, vol. 82(12), pages 2198-2205.
    7. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    8. Christopher R. Genovese & Marco Perone-Pacifico & Isabella Verdinelli & Larry Wasserman, 2016. "Non-parametric inference for density modes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 99-126, January.
    9. Arthur Pewsey & Eduardo García-Portugués, 2021. "Rejoinder on: Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 76-82, March.
    10. Teng Qiu & Yongjie Li, 2022. "Nearest Descent, In-Tree, and Clustering," Mathematics, MDPI, vol. 10(5), pages 1-37, February.
    11. Stefano Tonellato, 2019. "Bayesian nonparametric clustering as a community detection problem," Working Papers 2019: 20, Department of Economics, University of Venice "Ca' Foscari".
    12. José E. Chacón, 2019. "Mixture model modal clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 379-404, June.
    13. Lasse Holmström & Leena Pasanen, 2017. "Statistical Scale Space Methods," International Statistical Review, International Statistical Institute, vol. 85(1), pages 1-30, April.
    14. Blanquero, R. & Carrizosa, E. & Jiménez-Cordero, A. & Martín-Barragán, B., 2019. "Functional-bandwidth kernel for Support Vector Machine with Functional Data: An alternating optimization algorithm," European Journal of Operational Research, Elsevier, vol. 275(1), pages 195-207.
    15. Alessandro Casa & Luca Scrucca & Giovanna Menardi, 2021. "Better than the best? Answers via model ensemble in density-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 599-623, September.
    16. Weining Shen & Subhashis Ghosal, 2017. "Posterior Contraction Rates of Density Derivative Estimation," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 79(2), pages 336-354, August.
    17. Cheolwoo Park & Yongho Jeon & Kee-Hoon Kang, 2016. "An exploratory data analysis in scale-space for interval-valued data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(14), pages 2643-2660, October.
    18. Claudio Agostinelli & Luca Greco & Giovanni Saraceno, 2024. "Weighted likelihood methods for robust fitting of wrapped models for p-torus data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(4), pages 853-888, December.
    19. Adelchi Azzalini & Giovanna Menardi, 2016. "Density-based clustering with non-continuous data," Computational Statistics, Springer, vol. 31(2), pages 771-798, June.
    20. Filippone, Maurizio & Sanguinetti, Guido, 2011. "Approximate inference of the bandwidth in multivariate kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3104-3122, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:18:y:2024:i:4:d:10.1007_s11634-023-00575-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.