IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v160y2021ics0167947321000785.html
   My bibliography  Save this article

In the pursuit of sparseness: A new rank-preserving penalty for a finite mixture of factor analyzers

Author

Listed:
  • Kim, Nam-Hwui
  • Browne, Ryan P.

Abstract

A finite mixture of factor analyzers is an effective method for achieving parsimony in model-based clustering. Introducing a penalization term for the factor loading can lead to sparse estimates. However, in the pursuit of sparseness, one can end up with rank-deficient solutions regardless of the number of factors assumed. In light of this issue, a new penalty-based method that can fit a finite mixture of sparse factor analyzers with full-rank factor loading estimates is developed. In addition, the extension of an existing penalized factor analyzer model to a finite mixture is introduced.

Suggested Citation

  • Kim, Nam-Hwui & Browne, Ryan P., 2021. "In the pursuit of sparseness: A new rank-preserving penalty for a finite mixture of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
  • Handle: RePEc:eee:csdana:v:160:y:2021:i:c:s0167947321000785
    DOI: 10.1016/j.csda.2021.107244
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321000785
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107244?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Epskamp, Sacha & Cramer, Angélique O.J. & Waldorp, Lourens J. & Schmittmann, Verena D. & Borsboom, Denny, 2012. "qgraph: Network Visualizations of Relationships in Psychometric Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i04).
    2. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2014. "Mixtures of skew-t factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 326-335.
    3. Xiao‐Li Meng & David Van Dyk, 1997. "The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(3), pages 511-567.
    4. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    5. Lin, Tsung-I & McNicholas, Paul D. & Ho, Hsiu J., 2014. "Capturing patterns via parsimonious t mixture models," Statistics & Probability Letters, Elsevier, vol. 88(C), pages 80-87.
    6. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    7. Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
    8. Mohsen Maleki & Darren Wraith, 2019. "Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework," Computational Statistics, Springer, vol. 34(3), pages 1039-1053, September.
    9. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    10. Bergé, Laurent & Bouveyron, Charles & Girard, Stéphane, 2012. "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i06).
    11. Nam-Hwui Kim & Ryan Browne, 2019. "Subspace clustering for the finite mixture of generalized hyperbolic distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 641-661, September.
    12. de Leeuw, Jan & Lange, Kenneth, 2009. "Sharp quadratic majorization in one dimension," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2471-2484, May.
    13. Nickolay Trendafilov & Kohei Adachi, 2015. "Sparse Versus Simple Structure Loadings," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 776-790, September.
    14. Kiers, Henk A. L., 2002. "Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems," Computational Statistics & Data Analysis, Elsevier, vol. 41(1), pages 157-170, November.
    15. Nickolay T. Trendafilov & Sara Fontanella & Kohei Adachi, 2017. "Sparse Exploratory Factor Analysis," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 778-794, September.
    16. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    17. Ryan Browne & Paul McNicholas, 2014. "Estimating common principal components in high dimensions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 217-226, June.
    18. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    19. Kohei Adachi & Nickolay T. Trendafilov, 2018. "Sparsest factor analysis for clustering variables: a matrix decomposition approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 559-585, September.
    20. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    2. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    3. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    4. Michael P. B. Gallaugher & Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2022. "Multivariate cluster weighted models using skewed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 93-124, March.
    5. Rosember Guerra-Urzola & Katrijn Van Deun & Juan C. Vera & Klaas Sijtsma, 2021. "A Guide for Sparse PCA: Model Comparison and Applications," Psychometrika, Springer;The Psychometric Society, vol. 86(4), pages 893-919, December.
    6. Guerra Urzola, Rosember & Van Deun, Katrijn & Vera, J. C. & Sijtsma, K., 2021. "A guide for sparse PCA : Model comparison and applications," Other publications TiSEM 4d35b931-7f49-444b-b92f-a, Tilburg University, School of Economics and Management.
    7. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    8. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    9. Yuhong Wei & Paul McNicholas, 2015. "Mixture model averaging for clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(2), pages 197-217, June.
    10. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    11. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    12. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    13. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    14. Yu-Min Yen, 2010. "A Note on Sparse Minimum Variance Portfolios and Coordinate-Wise Descent Algorithms," Papers 1005.5082, arXiv.org, revised Sep 2013.
    15. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    16. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    17. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    18. Chaofeng Yuan & Wensheng Zhu & Xuming He & Jianhua Guo, 2019. "A mixture factor model with applications to microarray data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 60-76, March.
    19. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    20. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:160:y:2021:i:c:s0167947321000785. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.