IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v110y2017icp87-102.html
   My bibliography  Save this article

Mixture models for mixed-type data through a composite likelihood approach

Author

Listed:
  • Ranalli, Monia
  • Rocci, Roberto

Abstract

A mixture model is considered to classify continuous and/or ordinal variables. Under this model, both the continuous and the ordinal variables are assumed to follow a heteroscedastic Gaussian mixture model, where, as regards the ordinal variables, it is only partially observed. More specifically, the ordinal variables are assumed to be a discretization of some mixture variables. From a computational point of view, this creates some problems for the maximum likelihood estimation of model parameters. Indeed, the likelihood function involves multidimensional integrals, whose evaluation is computationally demanding as the number of ordinal variables increases. The proposal is to replace this cumbersome likelihood with a surrogate objective function that is easier to maximize. A composite approach is used, in particular the original joint distribution is replaced by the product of three blocks: the marginal distribution of continuous variables, all bivariate marginal distributions of ordinal variables and the marginal distributions given by all continuous variables and only one ordinal variable. This leads to a surrogate function that is the sum of the log contributions for each block. The estimation of model parameters is carried out maximizing the surrogate function within an EM-like algorithm. The effectiveness of the proposal is investigated through a simulation study and two applications to real data.

Suggested Citation

  • Ranalli, Monia & Rocci, Roberto, 2017. "Mixture models for mixed-type data through a composite likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 87-102.
  • Handle: RePEc:eee:csdana:v:110:y:2017:i:c:p:87-102
    DOI: 10.1016/j.csda.2016.12.016
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317300038
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.12.016?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Isabella Morlini, 2012. "A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(1), pages 5-28, April.
    2. Kanti V. Mardia & John T. Kent & Gareth Hughes & Charles C. Taylor, 2009. "Maximum likelihood estimation using composite likelihoods for closed exponential families," Biometrika, Biometrika Trust, vol. 96(4), pages 975-982.
    3. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    4. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    5. Everitt, B. S., 1988. "A finite mixture model for the clustering of mixed-mode data," Statistics & Probability Letters, Elsevier, vol. 6(5), pages 305-309, April.
    6. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    7. Lee, Sik-Yum & Poon, Wai-Yin & Bentler, P. M., 1990. "Full maximum likelihood analysis of structural equation models with polytomous variables," Statistics & Probability Letters, Elsevier, vol. 9(1), pages 91-97, January.
    8. Gao, Xin & Song, Peter X.-K., 2010. "Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1531-1540.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Monia Ranalli & Roberto Rocci, 2024. "Composite likelihood methods for parsimonious model-based clustering of mixed-type data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 381-407, June.
    2. Monia Ranalli & Roberto Rocci, 2017. "A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1007-1034, December.
    3. Kemmawadee Preedalikit & Daniel Fernández & Ivy Liu & Louise McMillan & Marta Nai Ruscone & Roy Costilla, 2024. "Row mixture-based clustering with covariates for ordinal responses," Computational Statistics, Springer, vol. 39(5), pages 2511-2555, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Monia Ranalli & Roberto Rocci, 2017. "A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1007-1034, December.
    2. Monia Ranalli & Roberto Rocci, 2024. "Composite likelihood methods for parsimonious model-based clustering of mixed-type data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 381-407, June.
    3. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    4. Zhang, Q. & Ip, E.H., 2014. "Variable assessment in latent class models," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 146-156.
    5. Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
    6. Tyler Roick & Dimitris Karlis & Paul D. McNicholas, 2021. "Clustering discrete-valued time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 209-229, March.
    7. Tang, Yang & Browne, Ryan P. & McNicholas, Paul D., 2015. "Model based clustering of high-dimensional binary data," Computational Statistics & Data Analysis, Elsevier, vol. 87(C), pages 84-101.
    8. Ryan P. Browne & Luca Bagnato & Antonio Punzo, 2024. "Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 597-625, September.
    9. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    10. Nam-Hwui Kim & Ryan Browne, 2019. "Subspace clustering for the finite mixture of generalized hyperbolic distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 641-661, September.
    11. Naderi, Mehrdad & Hung, Wen-Liang & Lin, Tsung-I & Jamalizadeh, Ahad, 2019. "A novel mixture model using the multivariate normal mean–variance mixture of Birnbaum–Saunders distributions and its application to extrasolar planets," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 126-138.
    12. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    13. Marbac, Matthieu & Sedki, Mohammed, 2017. "A family of block-wise one-factor distributions for modeling high-dimensional binary data," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 130-145.
    14. Wang, Wan-Lun, 2015. "Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 223-235.
    15. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    16. Browne, Ryan P., 2022. "Revitalizing the multivariate elliptical leptokurtic-normal distribution and its application in model-based clustering," Statistics & Probability Letters, Elsevier, vol. 190(C).
    17. Myrsini Katsikatsou & Irini Moustaki, 2016. "Pairwise Likelihood Ratio Tests and Model Selection Criteria for Structural Equation Models with Ordinal Variables," Psychometrika, Springer;The Psychometric Society, vol. 81(4), pages 1046-1068, December.
    18. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.
    19. Jason Hou-Liu & Ryan P. Browne, 2022. "Factor and hybrid components for model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 373-398, June.
    20. Kenne Pagui, E.C. & Salvan, A. & Sartori, N., 2015. "On full efficiency of the maximum composite likelihood estimator," Statistics & Probability Letters, Elsevier, vol. 97(C), pages 120-124.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:110:y:2017:i:c:p:87-102. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.