IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i12p3999-4017.html
   My bibliography  Save this article

New global optimization algorithms for model-based clustering

Author

Listed:
  • Heath, Jeffrey W.
  • Fu, Michael C.
  • Jank, Wolfgang

Abstract

The Expectation-Maximization (EM) algorithm is a very popular optimization tool for mixture problems and in particular for model-based clustering problems. However, while the algorithm is convenient to implement and numerically very stable, it only produces local solutions. Thus, it may not achieve the globally optimal solution in problems that have a large number of local optima. This paper introduces several new algorithms designed to produce global solutions in model-based clustering. The building blocks for these algorithms are methods from the operations research literature, namely the Cross-Entropy (CE) method and Model Reference Adaptive Search (MRAS). One problem with applying these methods directly is the efficient simulation of positive definite covariance matrices. We propose several new solutions to this problem. One solution is to apply the principles of Expectation-Maximization updating, which leads to two new algorithms, CE-EM and MRAS-EM. We also propose two additional algorithms, CE-CD and MRAS-CD, which rely on the Cholesky decomposition. We conduct numerical experiments of varying complexity to evaluate the effectiveness of the proposed algorithms in comparison to classical EM. We find that although a single run of the new algorithms is slower than a single run of EM, all have the potential for producing significantly better solutions. We also find that although repeat application of EM may achieve similar results, our algorithms provide automated, data-driven decision rules which may significantly reduce the burden of searching for the global optimum.

Suggested Citation

  • Heath, Jeffrey W. & Fu, Michael C. & Jank, Wolfgang, 2009. "New global optimization algorithms for model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3999-4017, October.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:3999-4017
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00247-3
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiaqiao Hu & Michael C. Fu & Steven I. Marcus, 2007. "A Model Reference Adaptive Search Method for Global Optimization," Operations Research, INFORMS, vol. 55(3), pages 549-568, June.
    2. Rubinstein, Reuven Y., 1997. "Optimization of computer simulation models with rare events," European Journal of Operational Research, Elsevier, vol. 99(1), pages 89-112, May.
    3. J. G. Booth & J. P. Hobert, 1999. "Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 265-285.
    4. Pieter-Tjerk de Boer & Dirk Kroese & Shie Mannor & Reuven Rubinstein, 2005. "A Tutorial on the Cross-Entropy Method," Annals of Operations Research, Springer, vol. 134(1), pages 19-67, February.
    5. Tu, Yufeng & Ball, Michael O. & Jank, Wolfgang S., 2008. "Estimating Flight Departure Delay DistributionsA Statistical Approach With Long-Term Trend and Short-Term Pattern," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 112-125, March.
    6. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    7. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    8. Dirk P. Kroese & Sergey Porotsky & Reuven Y. Rubinstein, 2006. "The Cross-Entropy Method for Continuous Multi-Extremal Optimization," Methodology and Computing in Applied Probability, Springer, vol. 8(3), pages 383-407, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jochen Ranger & Jorg-Tobias Kuhn, 2012. "A flexible latent trait model for response times in tests," Psychometrika, Springer;The Psychometric Society, vol. 77(1), pages 31-47, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zheng Peng & Donghua Wu & Quan Zheng, 2013. "A Level-Value Estimation Method and Stochastic Implementation for Global Optimization," Journal of Optimization Theory and Applications, Springer, vol. 156(2), pages 493-523, February.
    2. Zheng Peng & Donghua Wu & Wenxing Zhu, 2016. "The robust constant and its applications in random global search for unconstrained global optimization," Journal of Global Optimization, Springer, vol. 64(3), pages 469-482, March.
    3. Satyajith Amaran & Nikolaos V. Sahinidis & Bikram Sharda & Scott J. Bury, 2016. "Simulation optimization: a review of algorithms and applications," Annals of Operations Research, Springer, vol. 240(1), pages 351-380, May.
    4. Caballero, Rafael & Hernández-Díaz, Alfredo G. & Laguna, Manuel & Molina, Julián, 2015. "Cross entropy for multiobjective combinatorial optimization problems with linear relaxations," European Journal of Operational Research, Elsevier, vol. 243(2), pages 362-368.
    5. Hao Su & Qun Niu & Zhile Yang, 2023. "Optimal Power Flow Using Improved Cross-Entropy Method," Energies, MDPI, vol. 16(14), pages 1-33, July.
    6. Xi Chen & Enlu Zhou, 2015. "Population model-based optimization," Journal of Global Optimization, Springer, vol. 63(1), pages 125-148, September.
    7. Enlu Zhou & Shalabh Bhatnagar, 2018. "Gradient-Based Adaptive Stochastic Search for Simulation Optimization Over Continuous Space," INFORMS Journal on Computing, INFORMS, vol. 30(1), pages 154-167, February.
    8. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    9. Francesco Dotto & Alessio Farcomeni & Luis Angel García-Escudero & Agustín Mayo-Iscar, 2017. "A fuzzy approach to robust regression clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 691-710, December.
    10. Zaheer Ahmed & Alberto Cassese & Gerard Breukelen & Jan Schepers, 2023. "E-ReMI: Extended Maximal Interaction Two-mode Clustering," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 298-331, July.
    11. Rocci, Roberto & Vichi, Maurizio, 2008. "Two-mode multi-partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1984-2003, January.
    12. Sharon M. McNicholas & Paul D. McNicholas & Daniel A. Ashlock, 2021. "An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 264-279, July.
    13. Aghiles Salah & Mohamed Nadif, 2019. "Directional co-clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 591-620, September.
    14. Charles Audet & Jean Bigeon & Romain Couderc, 2021. "Combining Cross-Entropy and MADS Methods for Inequality Constrained Global Optimization," SN Operations Research Forum, Springer, vol. 2(3), pages 1-26, September.
    15. Thierry Chekouo & Alejandro Murua, 2018. "High-dimensional variable selection with the plaid mixture model for clustering," Computational Statistics, Springer, vol. 33(3), pages 1475-1496, September.
    16. Shuchismita Sarkar & Volodymyr Melnykov & Rong Zheng, 2020. "Gaussian mixture modeling and model-based clustering under measurement inconsistency," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 379-413, June.
    17. Erik Hintz & Marius Hofert & Christiane Lemieux & Yoshihiro Taniguchi, 2022. "Single-Index Importance Sampling with Stratification," Methodology and Computing in Applied Probability, Springer, vol. 24(4), pages 3049-3073, December.
    18. Xavier Bry & Lionel Cucala, 2022. "A von Mises–Fisher mixture model for clustering numerical and categorical variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 429-455, June.
    19. Mattrand, C. & Bourinet, J.-M., 2014. "The cross-entropy method for reliability assessment of cracked structures subjected to random Markovian loads," Reliability Engineering and System Safety, Elsevier, vol. 123(C), pages 171-182.
    20. Kin-Ping Hui, 2011. "Cooperative Cross-Entropy method for generating entangled networks," Annals of Operations Research, Springer, vol. 189(1), pages 205-214, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:3999-4017. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.