IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v66y2025i1d10.1007_s00362-024-01627-0.html
   My bibliography  Save this article

Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications

Author

Listed:
  • Samyajoy Pal

    (LMU Munich)

  • Christian Heumann

    (LMU Munich)

Abstract

This study revisits the Dirichlet Mixture Model (DMM), offering comprehensive insights into specific facets of parameter estimation. Estimating parameters of the DMM is challenging, with previous approaches focusing on standard parametrization, which lacks interpretability. We propose an alternative parametrization of the Dirichlet distribution using mean and precision, which provides critical insights into the distribution’s location and peakedness. This parametrization is versatile, covering a wide range of scenarios with varying locations and precision levels, making it applicable to diverse datasets. Depending on whether one or both parameters are unknown, the estimation procedure varies, and estimates also differ when precision is identical across mixture components. In this article, we introduce this alternative parametrization and meticulously explore four distinct scenarios, deriving maximum likelihood estimates (MLE) for each using the Expectation-Maximization (EM) algorithm. For high-dimensional data, where standard methods often falter due to additional challenges, we present an innovative estimation approach utilizing Stirling’s approximation and moment approximation, which provides closed-form solutions and faster execution times. Our study demonstrates the identifiability of the DMM and employs a closed-form approximation for Kullback–Leibler (KL) divergence to evaluate goodness of fit. Practical applications are illustrated through the analysis of both simulated and real datasets, showcasing the practical utility of the DMM.

Suggested Citation

  • Samyajoy Pal & Christian Heumann, 2025. "Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications," Statistical Papers, Springer, vol. 66(1), pages 1-38, January.
  • Handle: RePEc:spr:stpapr:v:66:y:2025:i:1:d:10.1007_s00362-024-01627-0
    DOI: 10.1007/s00362-024-01627-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-024-01627-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-024-01627-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. H. W. Kuhn, 1955. "The Hungarian method for the assignment problem," Naval Research Logistics Quarterly, John Wiley & Sons, vol. 2(1‐2), pages 83-97, March.
    4. Wan-Lun Wang & Ahad Jamalizadeh & Tsung-I Lin, 2020. "Finite mixtures of multivariate scale-shape mixtures of skew-normal distributions," Statistical Papers, Springer, vol. 61(6), pages 2643-2670, December.
    5. Kamila Fačevicová & Peter Filzmoser & Karel Hron, 2023. "Compositional cubes: a new concept for multi-factorial compositions," Statistical Papers, Springer, vol. 64(3), pages 955-985, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    2. Abbas Mahdavi & Anthony F. Desmond & Ahad Jamalizadeh & Tsung-I Lin, 2024. "Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering," Journal of Classification, Springer;The Classification Society, vol. 41(3), pages 620-649, November.
    3. Luca Benedetti & Eric Boniardi & Leonardo Chiani & Jacopo Ghirri & Marta Mastropietro & Andrea Cappozzo & Francesco Denti, 2024. "Variational inference for semiparametric Bayesian novelty detection in large datasets," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 681-703, September.
    4. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    5. Tu, Wangshu & Browne, Ryan & Subedi, Sanjeena, 2024. "A mixture of logistic skew-normal multinomial models," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).
    6. Mengbing Li & Daniel E. Park & Maliha Aziz & Cindy M. Liu & Lance B. Price & Zhenke Wu, 2023. "Integrating sample similarities into latent class analysis: a tree‐structured shrinkage approach," Biometrics, The International Biometric Society, vol. 79(1), pages 264-279, March.
    7. Chabert-Liddell, Saint-Clair & Barbillon, Pierre & Donnet, Sophie & Lazega, Emmanuel, 2021. "A stochastic block model approach for the analysis of multilevel networks: An application to the sociology of organizations," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    8. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    9. Mohammadamin Edrisi & Xiru Huang & Huw A. Ogilvie & Luay Nakhleh, 2023. "Accurate integration of single-cell DNA and RNA for analyzing intratumor heterogeneity using MaCroDNA," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    10. Nazila Zarghi, 2021. "Evidence-Based Social Sciences: A New Emerging Field," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, January -.
    11. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    12. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    13. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    14. F. Marta L. Di Lascio & Andrea Menapace & Roberta Pappadà, 2024. "A spatially‐weighted AMH copula‐based dissimilarity measure for clustering variables: An application to urban thermal efficiency," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    15. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.
    16. Irene Vrbik & Paul McNicholas, 2015. "Fractionally-Supervised Classification," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 359-381, October.
    17. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    18. Weiqiang Shen & Chuanlin Zhang & Xiaona Zhang & Jinglun Shi, 2019. "A fully distributed deployment algorithm for underwater strong k-barrier coverage using mobile sensors," International Journal of Distributed Sensor Networks, , vol. 15(4), pages 15501477198, April.
    19. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    20. Bo Cowgill & Jonathan M. V. Davis & B. Pablo Montagnes & Patryk Perkowski, 2024. "Stable Matching on the Job? Theory and Evidence on Internal Talent Markets," CESifo Working Paper Series 11120, CESifo.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:66:y:2025:i:1:d:10.1007_s00362-024-01627-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.