IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v7y2019i9p771-d259916.html
   My bibliography  Save this article

Estimating the Major Cluster by Mean-Shift with Updating Kernel

Author

Listed:
  • Ye Tian

    (Graduate School of Engineering, Gifu University, 1-1 Yanagido, Gifu-shi 501-1193, Japan)

  • Yasunari Yokota

    (Department of EECE, Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu-shi 501-1193, Japan)

Abstract

The mean-shift method is a convenient mode-seeking method. Using a principle of the sample mean over an analysis window, or kernel, in a data space where samples are distributed with bias toward the densest direction of sample from the kernel center, the mean-shift method is an attempt to seek the densest point of samples, or the sample mode, iteratively. A smaller kernel leads to convergence to a local mode that appears because of statistical fluctuation. A larger kernel leads to estimation of a biased mode affected by other clusters, abnormal values, or outliers if they exist other than in the major cluster. Therefore, optimal selection of the kernel size, which is designated as the bandwidth in many reports of the literature, represents an important problem. As described herein, assuming that the major cluster follows a Gaussian probability density distribution, and, assuming that the outliers do not affect the sample mode of the major cluster, and, by adopting a Gaussian kernel, we propose a new mean-shift by which both the mean vector and covariance matrix of the major cluster are estimated in each iteration. Subsequently, the kernel size and shape are updated adaptively. Numerical experiments indicate that the mean vector, covariance matrix, and the number of samples of the major cluster can be estimated stably. Because the kernel shape can be adjusted not only to an isotropic shape but also to an anisotropic shape according to the sample distribution, the proposed method has higher estimation precision than the general mean-shift.

Suggested Citation

  • Ye Tian & Yasunari Yokota, 2019. "Estimating the Major Cluster by Mean-Shift with Updating Kernel," Mathematics, MDPI, vol. 7(9), pages 1-25, August.
  • Handle: RePEc:gam:jmathe:v:7:y:2019:i:9:p:771-:d:259916
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/7/9/771/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/7/9/771/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Camila Zeller & Victor Lachos & Filidor Labra, 2014. "Influence diagnostics for Grubbs’s model with asymmetric heavy-tailed distributions," Statistical Papers, Springer, vol. 55(3), pages 671-690, August.
    2. Melnykov, Volodymyr & Melnykov, Igor, 2012. "Initializing the EM algorithm in Gaussian mixture models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1381-1395.
    3. Su Chen, 2015. "Optimal Bandwidth Selection for Kernel Density Functionals Estimation," Journal of Probability and Statistics, Hindawi, vol. 2015, pages 1-21, August.
    4. Yousri Slaoui, 2018. "Data-Driven Bandwidth Selection for Recursive Kernel Density Estimators Under Double Truncation," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 341-368, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    2. Xu, Wenjing & Pan, Qing & Gastwirth, Joseph L., 2014. "Cox proportional hazards models with frailty for negatively correlated employment processes," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 295-307.
    3. Semhar Michael & Volodymyr Melnykov, 2016. "An effective strategy for initializing the EM algorithm in finite mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 563-583, December.
    4. Xiaowen Dai & Libin Jin & Maozai Tian & Lei Shi, 2019. "Bayesian Local Influence for Spatial Autoregressive Models with Heteroscedasticity," Statistical Papers, Springer, vol. 60(5), pages 1423-1446, October.
    5. Moawia Alghalith, 2022. "Methods in Econophysics: Estimating the Probability Density and Volatility," Papers 2301.10178, arXiv.org.
    6. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    7. Volodymyr Melnykov, 2013. "Finite mixture modelling in mass spectrometry analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 573-592, August.
    8. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    9. Semhar Michael & Volodymyr Melnykov, 2016. "Finite Mixture Modeling of Gaussian Regression Time Series with Application to Dendrochronology," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 412-441, October.
    10. Yumi Oh & Peng Lyu & Sunwoo Ko & Jeongik Min & Juwhan Song, 2024. "Enhancing Broiler Weight Estimation through Gaussian Kernel Density Estimation Modeling," Agriculture, MDPI, vol. 14(6), pages 1-20, May.
    11. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    12. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    13. Chunzheng Cao & Mengqian Chen & Yahui Wang & Jian Qing Shi, 2018. "Heteroscedastic replicated measurement error models under asymmetric heavy-tailed distributions," Computational Statistics, Springer, vol. 33(1), pages 319-338, March.
    14. Antonello Maruotti & Antonio Punzo, 2021. "Initialization of Hidden Markov and Semi‐Markov Models: A Critical Evaluation of Several Strategies," International Statistical Review, International Statistical Institute, vol. 89(3), pages 447-480, December.
    15. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    16. Chunzheng Cao & Yahui Wang & Jian Qing Shi & Jinguan Lin, 2018. "Measurement Error Models for Replicated Data Under Asymmetric Heavy-Tailed Distributions," Computational Economics, Springer;Society for Computational Economics, vol. 52(2), pages 531-553, August.
    17. Shahid Latif & Slobodan P. Simonovic, 2023. "Trivariate Probabilistic Assessments of the Compound Flooding Events Using the 3-D Fully Nested Archimedean (FNA) Copula in the Semiparametric Distribution Setting," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 37(4), pages 1641-1693, March.
    18. Shiow-Lan Gau & Jean Dieu Tapsoba & Shen-Ming Lee, 2014. "Bayesian approach for mixture models with grouped data," Computational Statistics, Springer, vol. 29(5), pages 1025-1043, October.
    19. John Paolo Rosales Rivera, 2022. "A nonparametric approach to understanding poverty in the Philippines: Evidence from the Family Income and Expenditure Survey," Poverty & Public Policy, John Wiley & Sons, vol. 14(3), pages 242-267, September.
    20. Zeinolabedin Najafi & Karim Zare & Mohammad Reza Mahmoudi & Soheil Shokri & Amir Mosavi, 2022. "Inference and Local Influence Assessment in a Multifactor Skew-Normal Linear Mixed Model," Mathematics, MDPI, vol. 10(15), pages 1-21, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:7:y:2019:i:9:p:771-:d:259916. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.