IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i7p1091-d1370085.html
   My bibliography  Save this article

Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers

Author

Listed:
  • Shan Feng

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China
    College of Statistics, Xi’an University of Finance and Economics, Xi’an 710100, China)

  • Wenxian Xie

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China)

  • Yufeng Nie

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China)

Abstract

Finite Gaussian mixture models are powerful tools for modeling distributions of random phenomena and are widely used for clustering tasks. However, their interpretability and efficiency are often degraded by the impact of redundancy and noise, especially on high-dimensional datasets. In this work, we propose a generative graphical model for parsimonious modeling of the Gaussian mixtures and robust unsupervised learning. The model assumes that the data are generated independently and identically from a finite mixture of robust factor analyzers, where the features’ salience is adjusted by an active set of latent factors to allow a violation of the local independence assumption. For the model inference, we propose a structured variational Bayes inference framework to realize simultaneous clustering, model selection and outlier processing. Performance of the proposed algorithm is evaluated by conducting experiments on artificial and real-world datasets. Moreover, an application on the high-dimensional machine learning task of handwritten alphabet recognition is introduced.

Suggested Citation

  • Shan Feng & Wenxian Xie & Yufeng Nie, 2024. "Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers," Mathematics, MDPI, vol. 12(7), pages 1-23, April.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1091-:d:1370085
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/7/1091/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/7/1091/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Fan, Jianqing & Ke, Yuan & Wang, Kaizheng, 2020. "Factor-adjusted regularized model selection," Journal of Econometrics, Elsevier, vol. 216(1), pages 71-85.
    3. Emilie Devijver & Mélina Gallopin, 2018. "Block-Diagonal Covariance Selection for High-Dimensional Gaussian Graphical Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 306-314, January.
    4. McLachlan, G.J. & Bean, R.W. & Ben-Tovim Jones, L., 2007. "Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5327-5338, July.
    5. Qing Mai & Hui Zou & Ming Yuan, 2012. "A direct approach to sparse discriminant analysis in ultra-high dimensions," Biometrika, Biometrika Trust, vol. 99(1), pages 29-42.
    6. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    7. Zhang, Chun-Xia & Xu, Shuang & Zhang, Jiang-She, 2019. "A novel variational Bayesian method for variable selection in logistic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 1-19.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    2. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    3. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    4. Miao He & Yanhong Guo, 2022. "Systemic Risk Contributions of Financial Institutions during the Stock Market Crash in China," Sustainability, MDPI, vol. 14(9), pages 1-14, April.
    5. Sylvia Fruhwirth-Schnatter, 2023. "Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis," Papers 2303.00473, arXiv.org.
    6. Kastner, Gregor, 2019. "Sparse Bayesian time-varying covariance estimation in many dimensions," Journal of Econometrics, Elsevier, vol. 210(1), pages 98-115.
    7. Hsien-Tsung Chang & Nilamadhab Mishra & Chung-Chih Lin, 2015. "IoT Big-Data Centred Knowledge Granule Analytic and Cluster Framework for BI Applications: A Case Base Analysis," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-23, November.
    8. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    9. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    10. Wan-Lun Wang & Tsung-I Lin, 2013. "An efficient ECM algorithm for maximum likelihood estimation in mixtures of t-factor analyzers," Computational Statistics, Springer, vol. 28(2), pages 751-769, April.
    11. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    12. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    13. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    14. Niko Hauzenberger & Maximilian Bock & Michael Pfarrhofer & Anna Stelzer & Gregor Zens, 2018. "Implications of macroeconomic volatility in the Euro area," Papers 1801.02925, arXiv.org, revised Jun 2018.
    15. Oda, Ryoya & Suzuki, Yuya & Yanagihara, Hirokazu & Fujikoshi, Yasunori, 2020. "A consistent variable selection method in high-dimensional canonical discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    16. Chaofeng Yuan & Wensheng Zhu & Xuming He & Jianhua Guo, 2019. "A mixture factor model with applications to microarray data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 60-76, March.
    17. Matthew W. Wheeler, 2019. "Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high‐throughput toxicity testing," Biometrics, The International Biometric Society, vol. 75(1), pages 193-201, March.
    18. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    19. Jonas Krampe & Luca Margaritella, 2021. "Factor Models with Sparse VAR Idiosyncratic Components," Papers 2112.07149, arXiv.org, revised May 2022.
    20. Bonnie R. Joubert & Marianthi-Anna Kioumourtzoglou & Toccara Chamberlain & Hua Yun Chen & Chris Gennings & Mary E. Turyk & Marie Lynn Miranda & Thomas F. Webster & Katherine B. Ensor & David B. Dunson, 2022. "Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods," IJERPH, MDPI, vol. 19(3), pages 1-24, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1091-:d:1370085. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.