IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v39y2022i3d10.1007_s00357-022-09417-9.html
   My bibliography  Save this article

Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data

Author

Listed:
  • Yuan Fang

    (Boston University)

  • Dimitris Karlis

    (Athens University of Economics and Business)

  • Sanjeena Subedi

    (Carleton University)

Abstract

Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. For cluster analysis, using a traditional finite mixture model framework, the number of components either needs to be known a priori or needs to be estimated a posteriori using some model selection criterion after deriving results for a range of possible number of components. However, different model selection criteria can sometimes result in different numbers of components yielding uncertainty. Here, an infinite mixture model framework, also known as Dirichlet process mixture model, is proposed for the mixtures of MNIG distributions. This Dirichlet process mixture model approach allows the number of components to grow or decay freely from 1 to $$\infty$$ ∞ (in practice from 1 to N) and the number of components is inferred along with the parameter estimates in a Bayesian framework, thus alleviating the need for model selection criteria. We run our algorithm on simulated as well as real benchmark datasets and compare with other clustering approaches. The proposed method provides competitive results for both simulations and real data.

Suggested Citation

  • Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
  • Handle: RePEc:spr:jclass:v:39:y:2022:i:3:d:10.1007_s00357-022-09417-9
    DOI: 10.1007/s00357-022-09417-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-022-09417-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-022-09417-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    4. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    5. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    6. Ishwaran H. & James L. F, 2001. "Gibbs Sampling Methods for Stick Breaking Priors," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 161-173, March.
    7. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    8. Sanjeena Subedi & Paul McNicholas, 2014. "Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 167-193, June.
    9. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    10. Xiang Lu & Yaoxiang Li & Tanzy Love, 2021. "On Bayesian Analysis of Parsimonious Gaussian Mixture Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 576-593, October.
    11. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    12. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    13. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    2. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    3. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    4. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    5. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    6. Sharon M. McNicholas & Paul D. McNicholas & Daniel A. Ashlock, 2021. "An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 264-279, July.
    7. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.
    8. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 141-156.
    9. Ryan P. Browne & Luca Bagnato & Antonio Punzo, 2024. "Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 597-625, September.
    10. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    11. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    12. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    13. Komárek, Arnost, 2009. "A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3932-3947, October.
    14. You, Na & Dai, Hongsheng & Wang, Xueqin & Yu, Qingyun, 2024. "Sequential estimation for mixture of regression models for heterogeneous population," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    15. Roy Costilla & Ivy Liu & Richard Arnold & Daniel Fernández, 2019. "Bayesian model-based clustering for longitudinal ordinal data," Computational Statistics, Springer, vol. 34(3), pages 1015-1038, September.
    16. Marco Berrettini & Giuliano Galimberti & Saverio Ranciati, 2023. "Semiparametric finite mixture of regression models with Bayesian P-splines," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 745-775, September.
    17. Tu, Wangshu & Browne, Ryan & Subedi, Sanjeena, 2024. "A mixture of logistic skew-normal multinomial models," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).
    18. Arima, Serena & Basset, Alberto & Jona Lasinio, Giovanna & Pollice, Alessio & Rosati, Ilaria, 2013. "A hierarchical Bayesian model for the ecological status classification of lagoons," Ecological Modelling, Elsevier, vol. 263(C), pages 187-195.
    19. Xiang Lu & Yaoxiang Li & Tanzy Love, 2021. "On Bayesian Analysis of Parsimonious Gaussian Mixture Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 576-593, October.
    20. Rodríguez, Carlos E. & Núñez-Antonio, Gabriel & Escarela, Gabriel, 2020. "A Bayesian mixture model for clustering circular data," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:39:y:2022:i:3:d:10.1007_s00357-022-09417-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.