IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v83y2015icp223-235.html
   My bibliography  Save this article

Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values

Author

Listed:
  • Wang, Wan-Lun

Abstract

Mixtures of common t-factor analyzers (MCtFA) have emerged as a sound parsimonious model-based tool for robust modeling of high-dimensional data in the presence of fat-tailed noises and atypical observations. This paper presents a generalization of MCtFA to accommodate missing values as they frequently occur in many scientific researches. Under a missing at random mechanism, a computationally efficient Expectation Conditional Maximization Either (ECME) algorithm is developed for parameter estimation. The techniques for visualization of the data, classification of new individuals, and imputation of missing values under an incomplete-data structure of MCtFA are also investigated. Illustrative examples concerning the analysis of real and simulated data sets are presented to describe the usefulness of the proposed methodology and compare the finite sample performance with its normal counterparts.

Suggested Citation

  • Wang, Wan-Lun, 2015. "Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 223-235.
  • Handle: RePEc:eee:csdana:v:83:y:2015:i:c:p:223-235
    DOI: 10.1016/j.csda.2014.10.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947314002990
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.10.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cinzia Viroli, 2010. "Dimensionally Reduced Model-Based Clustering Through Mixtures of Factor Mixture Analyzers," Journal of Classification, Springer;The Classification Society, vol. 27(3), pages 363-388, November.
    2. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2014. "Mixtures of skew-t factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 326-335.
    3. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    4. Kotz,Samuel & Nadarajah,Saralees, 2004. "Multivariate T-Distributions and Their Applications," Cambridge Books, Cambridge University Press, number 9780521826549, September.
    5. Boldea, Otilia & Magnus, Jan R., 2009. "Maximum Likelihood Estimation of the Multivariate Normal Mixture Model," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1539-1549.
    6. Wan-Lun Wang & Tsung-I Lin, 2013. "An efficient ECM algorithm for maximum likelihood estimation in mixtures of t-factor analyzers," Computational Statistics, Springer, vol. 28(2), pages 751-769, April.
    7. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    8. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    9. Wang, Wan-Lun, 2013. "Mixtures of common factor analyzers for high-dimensional data with missing information," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 120-133.
    10. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    11. McLachlan, G.J. & Bean, R.W. & Ben-Tovim Jones, L., 2007. "Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5327-5338, July.
    12. Sharon Lee & Geoffrey McLachlan, 2013. "On mixtures of skew normal and skew $$t$$ -distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 241-266, September.
    13. Olvi L. Mangasarian & W. Nick Street & William H. Wolberg, 1995. "Breast Cancer Diagnosis and Prognosis Via Linear Programming," Operations Research, INFORMS, vol. 43(4), pages 570-577, August.
    14. Lin, Tsung-I, 2014. "Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 183-195.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Scrucca, Luca, 2016. "Identifying connected components in Gaussian finite mixture models for clustering," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 5-17.
    2. García-Escudero, Luis Angel & Gordaliza, Alfonso & Greselin, Francesca & Ingrassia, Salvatore & Mayo-Iscar, Agustín, 2016. "The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 131-147.
    3. Wang, Wan-Lun & Castro, Luis M. & Lin, Tsung-I, 2017. "Automated learning of t factor analysis models with complete and incomplete data," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 157-171.
    4. Wan-Lun Wang & Luis M. Castro & Yen-Ting Chang & Tsung-I Lin, 2019. "Mixtures of restricted skew-t factor analyzers with common factor loadings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 445-480, June.
    5. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    6. Wraith, Darren & Forbes, Florence, 2015. "Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 61-73.
    7. Ma, Xuan & Zhao, Jianhua & Wang, Yue & Shang, Changchun & Jiang, Fen, 2023. "Robust factored principal component analysis for matrix-valued outlier accommodation and detection," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering via mixtures of t factor analyzers with incomplete data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 659-690, September.
    2. Wan-Lun Wang & Tsung-I Lin, 2017. "Flexible clustering via extended mixtures of common t-factor analyzers," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 101(3), pages 227-252, July.
    3. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    4. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    5. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    6. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    7. Wang, Wan-Lun, 2013. "Mixtures of common factor analyzers for high-dimensional data with missing information," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 120-133.
    8. Tsung-I Lin & Pal Wu & Geoffrey McLachlan & Sharon Lee, 2015. "A robust factor analysis model using the restricted skew- $$t$$ t distribution," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 510-531, September.
    9. Wan-Lun Wang & Tsung-I Lin, 2020. "Automated learning of mixtures of factor analysis models with missing information," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 1098-1124, December.
    10. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    11. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2014. "Mixtures of skew-t factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 326-335.
    12. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering of multiply censored data via mixtures of t factor analyzers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 22-53, March.
    13. Vrbik, Irene & McNicholas, Paul D., 2014. "Parsimonious skew mixture models for model-based clustering and classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 196-210.
    14. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    15. Morris, Katherine & McNicholas, Paul D., 2016. "Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 133-150.
    16. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    17. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    18. Chaofeng Yuan & Wensheng Zhu & Xuming He & Jianhua Guo, 2019. "A mixture factor model with applications to microarray data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 60-76, March.
    19. Lin, Tsung-I & McNicholas, Paul D. & Ho, Hsiu J., 2014. "Capturing patterns via parsimonious t mixture models," Statistics & Probability Letters, Elsevier, vol. 88(C), pages 80-87.
    20. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:83:y:2015:i:c:p:223-235. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.