IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v9y2015i4p371-394.html
   My bibliography  Save this article

Maximum likelihood estimation of Gaussian mixture models without matrix operations

Author

Listed:
  • Hien Nguyen
  • Geoffrey McLachlan

Abstract

The Gaussian mixture model (GMM) is a popular tool for multivariate analysis, in particular, cluster analysis. The expectation–maximization (EM) algorithm is generally used to perform maximum likelihood (ML) estimation for GMMs due to the M-step existing in closed form and its desirable numerical properties, such as monotonicity. However, the EM algorithm has been criticized as being slow to converge and thus computationally expensive in some situations. In this article, we introduce the linear regression characterization (LRC) of the GMM. We show that the parameters of an LRC of the GMM can be mapped back to the natural parameters, and that a minorization–maximization (MM) algorithm can be constructed, which retains the desirable numerical properties of the EM algorithm, without the use of matrix operations. We prove that the ML estimators of the LRC parameters are consistent and asymptotically normal, like their natural counterparts. Furthermore, we show that the LRC allows for simple handling of singularities in the ML estimation of GMMs. Using numerical simulations in the R programming environment, we then demonstrate that the MM algorithm can be faster than the EM algorithm in various large data situations, where sample sizes range in the tens to hundreds of thousands and for estimating models with up to 16 mixture components on multivariate data with up to 16 variables. Copyright Springer-Verlag Berlin Heidelberg 2015

Suggested Citation

  • Hien Nguyen & Geoffrey McLachlan, 2015. "Maximum likelihood estimation of Gaussian mixture models without matrix operations," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 371-394, December.
  • Handle: RePEc:spr:advdac:v:9:y:2015:i:4:p:371-394
    DOI: 10.1007/s11634-015-0209-7
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s11634-015-0209-7
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s11634-015-0209-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hunter D.R. & Lange K., 2004. "A Tutorial on MM Algorithms," The American Statistician, American Statistical Association, vol. 58, pages 30-37, February.
    2. Salvatore Ingrassia, 2004. "A likelihood-based constrained algorithm for multivariate normal mixture models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 13(2), pages 151-166, September.
    3. J. Hartigan, 1985. "Statistical theory in clustering," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 63-76, December.
    4. Ingrassia, Salvatore & Minotti, Simona C. & Punzo, Antonio, 2014. "Model-based clustering via linear cluster-weighted models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 159-182.
    5. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    6. Ingrassia, Salvatore & Rocci, Roberto, 2007. "Constrained monotone EM algorithms for finite mixture of multivariate Gaussians," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5339-5351, July.
    7. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    8. S. Ingrassia, 1991. "Mixture decomposition via the simulated annealing algorithm," Applied Stochastic Models and Data Analysis, John Wiley & Sons, vol. 7(4), pages 317-325, December.
    9. Ingrassia, Salvatore & Rocci, Roberto, 2011. "Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1715-1725, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lloyd-Jones, Luke R. & Nguyen, Hien D. & McLachlan, Geoffrey J., 2018. "A globally convergent algorithm for lasso-penalized mixture of linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 19-38.
    2. Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
    3. Xifen Huang & Chaosong Xiong & Tao Jiang & Junfeng Lu & Jinfeng Xu, 2022. "Efficient Estimation and Inference in the Proportional Odds Model for Survival Data," Mathematics, MDPI, vol. 10(18), pages 1-17, September.
    4. Hien D. Nguyen & Geoffrey J. McLachlan & Jeremy F. P. Ullmann & Andrew L. Janke, 2016. "Spatial clustering of time series via mixture of autoregressions models and Markov random fields," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 70(4), pages 414-439, November.
    5. Nguyen, Hien D. & McLachlan, Geoffrey J., 2016. "Maximum likelihood estimation of triangular and polygonal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 102(C), pages 23-36.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    2. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    3. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    4. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    5. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    6. Andrews, Jeffrey L., 2018. "Addressing overfitting and underfitting in Gaussian model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 160-171.
    7. Pietro Coretto & Christian Hennig, 2016. "Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1648-1659, October.
    8. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    9. Kasa, Siva Rajesh & Rajan, Vaibhav, 2022. "Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation," Econometrics and Statistics, Elsevier, vol. 22(C), pages 67-97.
    10. Nguyen, Hien D. & McLachlan, Geoffrey J., 2016. "Laplace mixture of linear experts," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 177-191.
    11. Luis Angel García-Escudero & Alfonso Gordaliza & Francesca Greselin & Salvatore Ingrassia & Agustín Mayo-Iscar, 2018. "Eigenvalues and constraints in mixture modeling: geometric and computational issues," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 203-233, June.
    12. Rasmus Lentz & Jean Marc Robin & Suphanit Piyapromdee, 2018. "On Worker and Firm Heterogeneity in Wages and Employment Mobility: Evidence from Danish Register Data," 2018 Meeting Papers 469, Society for Economic Dynamics.
    13. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    14. Roberto Mari & Roberto Rocci & Stefano Antonio Gattone, 2020. "Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 49-78, March.
    15. Wu, Qiang & Yao, Weixin, 2016. "Mixtures of quantile regressions," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 162-176.
    16. Paolo Berta & Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini, 2016. "Multilevel cluster-weighted models for the evaluation of hospitals," METRON, Springer;Sapienza Università di Roma, vol. 74(3), pages 275-292, December.
    17. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    18. Salvatore Ingrassia & Antonio Punzo, 2020. "Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 526-547, July.
    19. Rasmus Lentz & Suphanit Piyapromdee & Jean-Marc Robin, 2022. "The Anatomy of Sorting - Evidence from Danish Data," Working Papers hal-03869383, HAL.
    20. Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2021. "Matrix Normal Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 556-575, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:9:y:2015:i:4:p:371-394. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.