IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i2p436-d1035285.html
   My bibliography  Save this article

A Survey on High-Dimensional Subspace Clustering

Author

Listed:
  • Wentao Qu

    (School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China)

  • Xianchao Xiu

    (School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China)

  • Huangyue Chen

    (Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China)

  • Lingchen Kong

    (School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China)

Abstract

With the rapid development of science and technology, high-dimensional data have been widely used in various fields. Due to the complex characteristics of high-dimensional data, it is usually distributed in the union of several low-dimensional subspaces. In the past several decades, subspace clustering (SC) methods have been widely studied as they can restore the underlying subspace of high-dimensional data and perform fast clustering with the help of the data self-expressiveness property. The SC methods aim to construct an affinity matrix by the self-representation coefficient of high-dimensional data and then obtain the clustering results using the spectral clustering method. The key is how to design a self-expressiveness model that can reveal the real subspace structure of data. In this survey, we focus on the development of SC methods in the past two decades and present a new classification criterion to divide them into three categories based on the purpose of clustering, i.e., low-rank sparse SC, local structure preserving SC, and kernel SC. We further divide them into subcategories according to the strategy of constructing the representation coefficient. In addition, the applications of SC methods in face recognition, motion segmentation, handwritten digits recognition, and speech emotion recognition are introduced. Finally, we have discussed several interesting and meaningful future research directions.

Suggested Citation

  • Wentao Qu & Xianchao Xiu & Huangyue Chen & Lingchen Kong, 2023. "A Survey on High-Dimensional Subspace Clustering," Mathematics, MDPI, vol. 11(2), pages 1-39, January.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:2:p:436-:d:1035285
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/2/436/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/2/436/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    2. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    3. P. Tseng, 2000. "Nearest q-Flat to m Points," Journal of Optimization Theory and Applications, Springer, vol. 105(1), pages 249-252, April.
    4. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    5. Howard D. Bondell & Brian J. Reich, 2008. "Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR," Biometrics, The International Biometric Society, vol. 64(1), pages 115-123, March.
    6. She, Yiyuan & Owen, Art B., 2011. "Outlier Detection Using Nonconvex Penalized Regression," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 626-639.
    7. Bingzhen Chen & Wenjuan Zhai & Lingchen Kong, 2022. "Variable selection and collinearity processing for multivariate data via row-elastic-net regularization," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(1), pages 79-96, March.
    8. Wang, Lie, 2013. "The L1 penalized LAD estimator for high dimensional linear regression," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 135-151.
    9. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    10. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    11. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    2. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    3. Ander Wilson & Brian J. Reich, 2014. "Confounder selection via penalized credible regions," Biometrics, The International Biometric Society, vol. 70(4), pages 852-861, December.
    4. Diebold, Francis X. & Shin, Minchul, 2019. "Machine learning for regularized survey forecast combination: Partially-egalitarian LASSO and its derivatives," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1679-1691.
    5. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    6. Justin B. Post & Howard D. Bondell, 2013. "Factor Selection and Structural Identification in the Interaction ANOVA Model," Biometrics, The International Biometric Society, vol. 69(1), pages 70-79, March.
    7. Mihee Lee & Haipeng Shen & Jianhua Z. Huang & J. S. Marron, 2010. "Biclustering via Sparse Singular Value Decomposition," Biometrics, The International Biometric Society, vol. 66(4), pages 1087-1095, December.
    8. Zuber Verena & Strimmer Korbinian, 2011. "High-Dimensional Regression and Variable Selection Using CAR Scores," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-27, July.
    9. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    10. Xing, Xin & Hu, Jinjin & Yang, Yaning, 2014. "Robust minimum variance portfolio with L-infinity constraints," Journal of Banking & Finance, Elsevier, vol. 46(C), pages 107-117.
    11. Philip Kostov & Thankom Arun & Samuel Annim, 2014. "Financial Services to the Unbanked: the case of the Mzansi intervention in South Africa," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 8(2), June.
    12. Luca Insolia & Ana Kenney & Martina Calovi & Francesca Chiaromonte, 2021. "Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression," Stats, MDPI, vol. 4(3), pages 1-17, August.
    13. Xiaofei Wu & Rongmei Liang & Hu Yang, 2022. "Penalized and constrained LAD estimation in fixed and high dimension," Statistical Papers, Springer, vol. 63(1), pages 53-95, February.
    14. Mishra, Aditya & Müller, Christian L., 2022. "Robust regression with compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    15. Kremer, Philipp J. & Lee, Sangkyun & Bogdan, Małgorzata & Paterlini, Sandra, 2020. "Sparse portfolio selection via the sorted ℓ1-Norm," Journal of Banking & Finance, Elsevier, vol. 110(C).
    16. Sunkyung Kim & Wei Pan & Xiaotong Shen, 2013. "Network-Based Penalized Regression With Application to Genomic Data," Biometrics, The International Biometric Society, vol. 69(3), pages 582-593, September.
    17. Howard D. Bondell & Brian J. Reich, 2009. "Simultaneous Factor Selection and Collapsing Levels in ANOVA," Biometrics, The International Biometric Society, vol. 65(1), pages 169-177, March.
    18. She, Yiyuan, 2012. "An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors," Computational Statistics & Data Analysis, Elsevier, vol. 56(10), pages 2976-2990.
    19. Li, Mei & Kong, Lingchen, 2019. "Double fused Lasso penalized LAD for matrix regression," Applied Mathematics and Computation, Elsevier, vol. 357(C), pages 119-138.
    20. Wei Pan & Benhuai Xie & Xiaotong Shen, 2010. "Incorporating Predictor Network in Penalized Regression with Application to Microarray Data," Biometrics, The International Biometric Society, vol. 66(2), pages 474-484, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:2:p:436-:d:1035285. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.