IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v154y2017icp1-18.html
   My bibliography  Save this article

Sparse clustering of functional data

Author

Listed:
  • Floriello, Davide
  • Vitelli, Valeria

Abstract

We consider the problem of clustering functional data while jointly selecting the most relevant features for classification. Functional sparse clustering is here analytically defined as a variational problem with a hard thresholding constraint ensuring the sparsity of the solution. First, a unique solution to sparse clustering with hard thresholding in finite dimensions is proved to exist. Then, the infinite-dimensional generalization is given and proved to have a unique solution under reasonable assumptions. Both the multivariate and the functional versions of sparse clustering with hard thresholding exhibit improvements on other standard and sparse clustering strategies on simulated data. A real functional data application is also shown.

Suggested Citation

  • Floriello, Davide & Vitelli, Valeria, 2017. "Sparse clustering of functional data," Journal of Multivariate Analysis, Elsevier, vol. 154(C), pages 1-18.
  • Handle: RePEc:eee:jmvana:v:154:y:2017:i:c:p:1-18
    DOI: 10.1016/j.jmva.2016.10.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X16301208
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2016.10.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wei‐Chien Chang, 1983. "On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 32(3), pages 267-275, November.
    2. Fraiman, Ricardo & Gimenez, Yanina & Svarc, Marcela, 2016. "Feature selection for functional data," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 191-208.
    3. Weiliang Qiu & Harry Joe, 2006. "Generation of Random Clusters with Specified Degree of Separation," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 315-334, September.
    4. Matsui, Hidetoshi, 2014. "Variable and boundary selection for functional data via multiclass logistic regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 176-185.
    5. Germán Aneiros & Philippe Vieu, 2016. "Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(1), pages 27-32, March.
    6. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    7. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    8. Aneiros, Germán & Vieu, Philippe, 2014. "Variable selection in infinite-dimensional problems," Statistics & Probability Letters, Elsevier, vol. 94(C), pages 12-20.
    9. Sijian Wang & Ji Zhu, 2008. "Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data," Biometrics, The International Biometric Society, vol. 64(2), pages 440-448, June.
    10. Germán Aneiros-Pérez & Philippe Vieu, 2013. "Testing linearity in semi-parametric functional data analysis," Computational Statistics, Springer, vol. 28(2), pages 413-434, April.
    11. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    12. Sangalli, Laura M. & Secchi, Piercesare & Vantini, Simone & Vitelli, Valeria, 2010. "k-mean alignment for curve clustering," Computational Statistics & Data Analysis, Elsevier, vol. 54(5), pages 1219-1233, May.
    13. Lee, Eun Ryung & Park, Byeong U., 2012. "Sparse estimation in functional linear regression," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 1-17.
    14. Qiu, Weiliang & Joe, Harry, 2006. "Separation index and partial membership for clustering," Computational Statistics & Data Analysis, Elsevier, vol. 50(3), pages 585-603, February.
    15. Germán Aneiros & Philippe Vieu, 2016. "Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(1), pages 27-32, March.
    16. G. Aneiros & P. Vieu, 2016. "Sparse nonparametric model for regression with functional covariate," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(4), pages 839-859, October.
    17. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2009. "Variable selection in model-based clustering: A general variable role modeling," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3872-3882, September.
    18. Tian, Tian Siva & James, Gareth M., 2013. "Interpretable dimension reduction for classifying functional data," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 282-296.
    19. Martin-Barragan, Belen & Lillo, Rosa & Romo, Juan, 2014. "Interpretable support vector machines for functional data," European Journal of Operational Research, Elsevier, vol. 232(1), pages 146-155.
    20. Jerome H. Friedman & Jacqueline J. Meulman, 2004. "Clustering objects on subsets of attributes (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(4), pages 815-849, November.
    21. Witten, Daniela M. & Tibshirani, Robert, 2010. "A Framework for Feature Selection in Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 713-726.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. B. Lafuente-Rego & P. D’Urso & J. A. Vilar, 2020. "Robust fuzzy clustering based on quantile autocovariances," Statistical Papers, Springer, vol. 61(6), pages 2393-2448, December.
    2. Kim, Joonpyo & Oh, Hee-Seok, 2020. "Pseudo-quantile functional data clustering," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    3. Qingzhi Zhong & Huazhen Lin & Yi Li, 2021. "Cluster non‐Gaussian functional data," Biometrics, The International Biometric Society, vol. 77(3), pages 852-865, September.
    4. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    5. Vieu, Philippe, 2018. "On dimension reduction models for functional data," Statistics & Probability Letters, Elsevier, vol. 136(C), pages 134-138.
    6. Dominik Poß & Dominik Liebl & Alois Kneip & Hedwig Eisenbarth & Tor D. Wager & Lisa Feldman Barrett, 2020. "Superconsistent estimation of points of impact in non‐parametric regression with functional predictors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1115-1140, September.
    7. Ye, Mao & Zhang, Peng & Nie, Lizhen, 2018. "Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 32-49.
    8. Marco Stefanucci & Laura M. Sangalli & Pierpaolo Brutti, 2018. "PCA‐based discrimination of partially observed functional data, with an application to AneuRisk65 data set," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 72(3), pages 246-264, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fraiman, Ricardo & Gimenez, Yanina & Svarc, Marcela, 2016. "Feature selection for functional data," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 191-208.
    2. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    3. Arias-Castro, Ery & Pu, Xiao, 2017. "A simple approach to sparse clustering," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 217-228.
    4. Aneiros, Germán & Cao, Ricardo & Fraiman, Ricardo & Genest, Christian & Vieu, Philippe, 2019. "Recent advances in functional data analysis and high-dimensional statistics," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 3-9.
    5. Charles Bouveyron & Camille Brunet-Saumard, 2014. "Discriminative variable selection for clustering with the sparse Fisher-EM algorithm," Computational Statistics, Springer, vol. 29(3), pages 489-513, June.
    6. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    7. Rendon Aguirre, Janeth Carolina, 2017. "Clustering Big Data by Extreme Kurtosis Projections," DES - Working Papers. Statistics and Econometrics. WS 24522, Universidad Carlos III de Madrid. Departamento de Estadística.
    8. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    9. Vieu, Philippe, 2018. "On dimension reduction models for functional data," Statistics & Probability Letters, Elsevier, vol. 136(C), pages 134-138.
    10. Gaynor, Sheila & Bair, Eric, 2017. "Identification of relevant subtypes via preweighted sparse clustering," Computational Statistics & Data Analysis, Elsevier, vol. 116(C), pages 139-154.
    11. Matthieu Marbac & Mohammed Sedki & Tienne Patin, 2020. "Variable Selection for Mixed Data Clustering: Application in Human Population Genomics," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 124-142, April.
    12. Banerjee, Trambak & Mukherjee, Gourab & Radchenko, Peter, 2017. "Feature screening in large scale cluster analysis," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 191-212.
    13. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    14. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    15. Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
    16. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    17. Šárka Brodinová & Peter Filzmoser & Thomas Ortner & Christian Breiteneder & Maia Rohm, 2019. "Robust and sparse k-means clustering for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 905-932, December.
    18. Crook Oliver M. & Gatto Laurent & Kirk Paul D. W., 2019. "Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-20, December.
    19. Thierry Chekouo & Alejandro Murua, 2018. "High-dimensional variable selection with the plaid mixture model for clustering," Computational Statistics, Springer, vol. 33(3), pages 1475-1496, September.
    20. Melnykov, Volodymyr, 2016. "Model-based biclustering of clickstream data," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 31-45.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:154:y:2017:i:c:p:1-18. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.