IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v36y2019i2d10.1007_s00357-019-09313-9.html
   My bibliography  Save this article

Multiscale Clustering for Functional Data

Author

Listed:
  • Yaeji Lim

    (Chung-Ang University)

  • Hee-Seok Oh

    (Seoul National University)

  • Ying Kuen Cheung

    (Columbia University)

Abstract

In an era of massive and complex data, clustering is one of the most important procedures for understanding and analyzing unstructured multivariate data. Classical methods such as K-means and hierarchical clustering, however, are not efficient in grouping data that are high dimensional and have inherent multiscale structures. This paper presents new clustering procedures that can adapt to multiscale characteristics and high dimensionality of data. The proposed methods are based on a novel combination of multiresolution analysis and functional data analysis. As the core of the methodology, a clustering approach using the concept of multiresolution analysis may reflect both the global trend and local activities of data, and functional data analysis handles the high-dimensional data efficiently. Practical algorithms to implement the proposed methods are further discussed. The empirical performance of the proposed methods is evaluated through numerical studies including a simulation study and real data analysis, which demonstrates promising results of the proposed clustering.

Suggested Citation

  • Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
  • Handle: RePEc:spr:jclass:v:36:y:2019:i:2:d:10.1007_s00357-019-09313-9
    DOI: 10.1007/s00357-019-09313-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-019-09313-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-019-09313-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. M. Giacofci & S. Lambert-Lacroix & G. Marot & F. Picard, 2013. "Wavelet-Based Clustering for Mixed-Effects Functional Models in High Dimension," Biometrics, The International Biometric Society, vol. 69(1), pages 31-40, March.
    2. Jeng‐Min Chiou & Pai‐Ling Li, 2007. "Functional clustering and identifying substructures of longitudinal data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(4), pages 679-699, September.
    3. M. P. Wand, 2000. "A Comparison of Regression Spline Smoothing Procedures," Computational Statistics, Springer, vol. 15(4), pages 443-462, December.
    4. James G.M. & Sugar C.A., 2003. "Clustering for Sparsely Sampled Functional Data," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 397-408, January.
    5. Shubhankar Ray & Bani Mallick, 2006. "Functional clustering by Bayesian wavelet methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(2), pages 305-332, April.
    6. Witten, Daniela M. & Tibshirani, Robert, 2010. "A Framework for Feature Selection in Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 713-726.
    7. Floriello, Davide & Vitelli, Valeria, 2017. "Sparse clustering of functional data," Journal of Multivariate Analysis, Elsevier, vol. 154(C), pages 1-18.
    8. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    9. Jeffrey S. Morris & Raymond J. Carroll, 2006. "Wavelet‐based functional mixed models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(2), pages 179-199, April.
    10. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    11. Lee, Thomas C. M., 2004. "Improved smoothing spline regression by combining estimates of different smoothness," Statistics & Probability Letters, Elsevier, vol. 67(2), pages 133-140, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    2. Kim, Joonpyo & Oh, Hee-Seok, 2020. "Pseudo-quantile functional data clustering," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    3. Ja‐Yoon Jang & Hee‐Seok Oh & Yaeji Lim & Ying Kuen Cheung, 2021. "Ensemble clustering for step data via binning," Biometrics, The International Biometric Society, vol. 77(1), pages 293-304, March.
    4. Dongik Jang & Hee-Seok Oh & Philippe Naveau, 2017. "Identifying local smoothness for spatially inhomogeneous functions," Computational Statistics, Springer, vol. 32(3), pages 1115-1138, September.
    5. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    6. Zhiguang Huo & Li Zhu & Tianzhou Ma & Hongcheng Liu & Song Han & Daiqing Liao & Jinying Zhao & George Tseng, 2020. "Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(1), pages 1-22, April.
    7. Vogt, Michael & Linton, Oliver, 2020. "Multiscale clustering of nonparametric regression curves," Journal of Econometrics, Elsevier, vol. 216(1), pages 305-325.
    8. Lingsong Meng & Dorina Avram & George Tseng & Zhiguang Huo, 2022. "Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 352-375, March.
    9. Clémençon, Stéphan, 2014. "A statistical view of clustering performance through the theory of U-processes," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 42-56.
    10. Slaets, Leen & Claeskens, Gerda & Hubert, Mia, 2012. "Phase and amplitude-based clustering for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2360-2374.
    11. Michael Vogt & Oliver Linton, 2015. "Classification of nonparametric regression functions in heterogeneous panels," CeMMAP working papers CWP06/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    12. Michael Vogt & Oliver Linton, 2015. "Classification of nonparametric regression functions in heterogeneous panels," CeMMAP working papers 06/15, Institute for Fiscal Studies.
    13. Cho, Haeran & Goude, Yannig & Brossat, Xavier & Yao, Qiwei, 2013. "Modeling and forecasting daily electricity load curves: a hybrid approach," LSE Research Online Documents on Economics 49634, London School of Economics and Political Science, LSE Library.
    14. Jacques, Julien & Preda, Cristian, 2014. "Model-based clustering for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 92-106.
    15. Liu, Xueli & Yang, Mark C.K., 2009. "Simultaneous curve registration and clustering for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1361-1376, February.
    16. Yujia Li & Xiangrui Zeng & Chien‐Wei Lin & George C. Tseng, 2022. "Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis," Biometrics, The International Biometric Society, vol. 78(2), pages 574-585, June.
    17. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    18. Dong Liu & Changwei Zhao & Yong He & Lei Liu & Ying Guo & Xinsheng Zhang, 2023. "Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix‐variate fMRI data," Biometrics, The International Biometric Society, vol. 79(3), pages 2246-2259, September.
    19. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    20. Jang, Dongik & Oh, Hee-Seok, 2011. "Enhancement of spatially adaptive smoothing splines via parameterization of smoothing parameters," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1029-1040, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:36:y:2019:i:2:d:10.1007_s00357-019-09313-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.