IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v71y2009i1p301-318.html
   My bibliography  Save this article

Robust linear clustering

Author

Listed:
  • L. A. García‐Escudero
  • A. Gordaliza
  • R. San Martín
  • S. Van Aelst
  • R. Zamar

Abstract

Summary. Non‐hierarchical clustering methods are frequently based on the idea of forming groups around ‘objects’. The main exponent of this class of methods is the k‐means method, where these objects are points. However, clusters in a data set may often be due to certain relationships between the measured variables. For instance, we can find linear structures such as straight lines and planes, around which the observations are grouped in a natural way. These structures are not well represented by points. We present a method that searches for linear groups in the presence of outliers. The method is based on the idea of impartial trimming. We search for the ‘best’ subsample containing a proportion 1−α of the data and the best k affine subspaces fitting to those non‐discarded observations by measuring discrepancies through orthogonal distances. The population version of the sample problem is also considered. We prove the existence of solutions for the sample and population problems together with their consistency. A feasible algorithm for solving the sample problem is described as well. Finally, some examples showing how the method proposed works in practice are provided.

Suggested Citation

  • L. A. García‐Escudero & A. Gordaliza & R. San Martín & S. Van Aelst & R. Zamar, 2009. "Robust linear clustering," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(1), pages 301-318, January.
  • Handle: RePEc:bla:jorssb:v:71:y:2009:i:1:p:301-318
    DOI: 10.1111/j.1467-9868.2008.00682.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2008.00682.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2008.00682.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Van Aelst, Stefan & (Steven) Wang, Xiaogang & Zamar, Ruben H. & Zhu, Rong, 2006. "Linear grouping using orthogonal regression," Computational Statistics & Data Analysis, Elsevier, vol. 50(5), pages 1287-1312, March.
    2. Hennig, Christian & Christlieb, Norbert, 2002. "Validating visual clusters in large datasets: fixed point clusters of spectral features," Computational Statistics & Data Analysis, Elsevier, vol. 40(4), pages 723-739, October.
    3. Wayne DeSarbo & Richard Oliver & Arvind Rangaswamy, 1989. "A simulated annealing methodology for clusterwise linear regression," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 707-736, September.
    4. Wayne DeSarbo & William Cron, 1988. "A maximum likelihood methodology for clusterwise linear regression," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 249-282, September.
    5. Swayne, Deborah F. & Lang, Duncan Temple & Buja, Andreas & Cook, Dianne, 2003. "GGobi: evolving from XGobi into an extensible framework for interactive data visualization," Computational Statistics & Data Analysis, Elsevier, vol. 43(4), pages 423-444, August.
    6. Hennig, Christian, 2003. "Clusters, outliers, and regression: fixed point clusters," Journal of Multivariate Analysis, Elsevier, vol. 86(1), pages 183-212, July.
    7. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    2. Hu, Hao & Yao, Weixin & Wu, Yichao, 2017. "The robust EM-type algorithms for log-concave mixtures of regression models," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 14-26.
    3. García-Escudero, L.A. & Gordaliza, A. & Mayo-Iscar, A. & San Martín, R., 2010. "Robust clusterwise linear regression through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3057-3069, December.
    4. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    5. Gao, Jinxin & Hitchcock, David B., 2010. "James-Stein shrinkage to improve k-means cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(9), pages 2113-2127, September.
    6. Luca Greco, 2022. "Robust fitting of mixtures of GLMs by weighted likelihood," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(1), pages 25-48, March.
    7. Luca Greco & Antonio Lucadamo & Claudio Agostinelli, 2021. "Weighted likelihood latent class linear regression," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(2), pages 711-746, June.
    8. Andrea Cerioli & Domenico Perrotta, 2014. "Robust clustering around regression lines with high density regions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 5-26, March.
    9. Bai, Xiuqin & Yao, Weixin & Boyer, John E., 2012. "Robust fitting of mixture regression models," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2347-2359.
    10. Stefan Aelst & Ruben H. Zamar, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 360-362, June.
    11. Luis García-Escudero & Alfonso Gordaliza & Carlos Matrán & Agustín Mayo-Iscar, 2010. "A review of robust clustering methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 89-109, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. D'Urso, Pierpaolo & Santoro, Adriana, 2006. "Fuzzy clusterwise linear regression analysis with symmetrical fuzzy output variable," Computational Statistics & Data Analysis, Elsevier, vol. 51(1), pages 287-313, November.
    2. García-Escudero, L.A. & Gordaliza, A. & Mayo-Iscar, A. & San Martín, R., 2010. "Robust clusterwise linear regression through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3057-3069, December.
    3. Adil M. Bagirov & Julien Ugon & Hijran G. Mirzayeva, 2015. "Nonsmooth Optimization Algorithm for Solving Clusterwise Linear Regression Problems," Journal of Optimization Theory and Applications, Springer, vol. 164(3), pages 755-780, March.
    4. Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
    5. Rainer Schlittgen, 2011. "A weighted least-squares approach to clusterwise regression," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(2), pages 205-217, June.
    6. Francesco Dotto & Alessio Farcomeni & Luis Angel García-Escudero & Agustín Mayo-Iscar, 2017. "A fuzzy approach to robust regression clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 691-710, December.
    7. Réal Carbonneau & Gilles Caporossi & Pierre Hansen, 2014. "Globally Optimal Clusterwise Regression By Column Generation Enhanced with Heuristics, Sequencing and Ending Subset Optimization," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 219-241, July.
    8. Tom Frans Wilderjans & Eva Gaer & Henk A. L. Kiers & Iven Mechelen & Eva Ceulemans, 2017. "Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data," Psychometrika, Springer;The Psychometric Society, vol. 82(1), pages 86-111, March.
    9. Joki, Kaisa & Bagirov, Adil M. & Karmitsa, Napsu & Mäkelä, Marko M. & Taheri, Sona, 2020. "Clusterwise support vector linear regression," European Journal of Operational Research, Elsevier, vol. 287(1), pages 19-35.
    10. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    11. Francesca Torti & Domenico Perrotta & Marco Riani & Andrea Cerioli, 2019. "Assessing trimming methodologies for clustering linear regression data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 227-257, March.
    12. Van Aelst, Stefan & (Steven) Wang, Xiaogang & Zamar, Ruben H. & Zhu, Rong, 2006. "Linear grouping using orthogonal regression," Computational Statistics & Data Analysis, Elsevier, vol. 50(5), pages 1287-1312, March.
    13. Fritz, Heinrich & García-Escudero, Luis A. & Mayo-Iscar, Agustín, 2012. "tclust: An R Package for a Trimming Approach to Cluster Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 47(i12).
    14. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2008. "Segmentação de Mercado e modelos mistura de regressão para variáveis normais," FEP Working Papers 262, Universidade do Porto, Faculdade de Economia do Porto.
    15. Luis García-Escudero & Alfonso Gordaliza & Carlos Matrán & Agustín Mayo-Iscar, 2010. "A review of robust clustering methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 89-109, September.
    16. Wayne S. DeSarbo & Qian Chen & Ashley Stadler Blank, 2017. "A Parametric Constrained Segmentation Methodology for Application in Sport Marketing," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 4(4), pages 37-55, December.
    17. Hennig, Christian, 2003. "Clusters, outliers, and regression: fixed point clusters," Journal of Multivariate Analysis, Elsevier, vol. 86(1), pages 183-212, July.
    18. Carbonneau, Réal A. & Caporossi, Gilles & Hansen, Pierre, 2011. "Globally optimal clusterwise regression by mixed logical-quadratic programming," European Journal of Operational Research, Elsevier, vol. 212(1), pages 213-222, July.
    19. Hye Suk & Heungsun Hwang, 2010. "Regularized fuzzy clusterwise ridge regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 35-51, April.
    20. Cremaschini, Alessandro & Maruotti, Antonello, 2023. "A finite mixture analysis of structural breaks in the G-7 gross domestic product series," Research in Economics, Elsevier, vol. 77(1), pages 76-90.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:71:y:2009:i:1:p:301-318. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.