IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v070i02.html
   My bibliography  Save this article

GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models

Author

Listed:
  • Bilgrau, Anders Ellern
  • Eriksen, Poul Svante
  • Rasmussen, Jakob Gulddahl
  • Johnsen, Hans Erik
  • Dybkaer, Karen
  • Boegsted, Martin

Abstract

Methods for clustering in unsupervised learning are an important part of the statistical toolbox in numerous scientific disciplines. Tewari, Giering, and Raghunathan (2011) proposed to use so-called Gaussian mixture copula models (GMCM) for general unsupervised learning based on clustering. Li, Brown, Huang, and Bickel (2011) independently discussed a special case of these GMCMs as a novel approach to meta-analysis in highdimensional settings. GMCMs have attractive properties which make them highly flexible and therefore interesting alternatives to other well-established methods. However, parameter estimation is hard because of intrinsic identifiability issues and intractable likelihood functions. Both aforementioned papers discuss similar expectation-maximization-like algorithms as their pseudo maximum likelihood estimation procedure. We present and discuss an improved implementation in R of both classes of GMCMs along with various alternative optimization routines to the EM algorithm. The software is freely available in the R package GMCM. The implementation is fast, general, and optimized for very large numbers of observations. We demonstrate the use of package GMCM through different applications.

Suggested Citation

  • Bilgrau, Anders Ellern & Eriksen, Poul Svante & Rasmussen, Jakob Gulddahl & Johnsen, Hans Erik & Dybkaer, Karen & Boegsted, Martin, 2016. "GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i02).
  • Handle: RePEc:jss:jstsof:v:070:i02
    DOI: http://hdl.handle.net/10.18637/jss.v070.i02
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v070i02/v70i02.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v070i02/GMCM_1.2.3.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v070i02/v70i02.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v070.i02?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chen, Xiaohong & Fan, Yanqin & Tsyrennikov, Viktor, 2006. "Efficient Estimation of Semiparametric Multivariate Copula Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1228-1240, September.
    2. Smyth Gordon K, 2004. "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-28, February.
    3. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sabyasachi Guharay & KC Chang & Jie Xu, 2017. "Robust Estimation of Value-at-Risk through Distribution-Free and Parametric Approaches Using the Joint Severity and Frequency Model: Applications in Financial, Actuarial, and Natural Calamities Domain," Risks, MDPI, vol. 5(3), pages 1-30, July.
    2. Fritzsch, Simon & Timphus, Maike & Weiß, Gregor, 2024. "Marginals versus copulas: Which account for more model risk in multivariate risk forecasting?," Journal of Banking & Finance, Elsevier, vol. 158(C).
    3. Kasa, Siva Rajesh & Rajan, Vaibhav, 2022. "Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation," Econometrics and Statistics, Elsevier, vol. 22(C), pages 67-97.
    4. Simon Fritzsch & Maike Timphus & Gregor Weiss, 2021. "Marginals Versus Copulas: Which Account For More Model Risk In Multivariate Risk Forecasting?," Papers 2109.10946, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Montazeri Zahra & Yanofsky Corey M. & Bickel David R., 2010. "Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    2. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    3. Hong, Zhaoping & Lian, Heng, 2012. "BOPA: A Bayesian hierarchical model for outlier expression detection," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4146-4156.
    4. Marot Guillemette & Mayer Claus-Dieter, 2009. "Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-35, January.
    5. Mark A. van de Wiel & Kyung In Kim, 2007. "Estimating the False Discovery Rate Using Nonparametric Deconvolution," Biometrics, The International Biometric Society, vol. 63(3), pages 806-815, September.
    6. Youngjo Lee & Jan F. Bjørnstad, 2013. "Extended likelihood approach to large-scale multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 553-575, June.
    7. Bickel David R., 2008. "Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-27, March.
    8. Robin, Stephane & Bar-Hen, Avner & Daudin, Jean-Jacques & Pierre, Laurent, 2007. "A semi-parametric approach for mixture models: Application to local false discovery rate estimation," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5483-5493, August.
    9. Long Qu & Dan Nettleton & Jack C. M. Dekkers, 2012. "A Hierarchical Semiparametric Model for Incorporating Intergene Information for Analysis of Genomic Data," Biometrics, The International Biometric Society, vol. 68(4), pages 1168-1177, December.
    10. Aaron C Ericsson & J Wade Davis & William Spollen & Nathan Bivens & Scott Givan & Catherine E Hagan & Mark McIntosh & Craig L Franklin, 2015. "Effects of Vendor and Genetic Background on the Composition of the Fecal Microbiota of Inbred Mice," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-19, February.
    11. Pounds Stanley B. & Gao Cuilan L. & Zhang Hui, 2012. "Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-32, October.
    12. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    13. Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
    14. Agbeyegbe, Terence D., 2015. "An inverted U-shaped crude oil price return-implied volatility relationship," Review of Financial Economics, Elsevier, vol. 27(C), pages 28-45.
    15. Xiaohong Li & Guy N Brock & Eric C Rouchka & Nigel G F Cooper & Dongfeng Wu & Timothy E O’Toole & Ryan S Gill & Abdallah M Eteleeb & Liz O’Brien & Shesh N Rai, 2017. "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    16. Kerr Kathleen F., 2012. "Optimality Criteria for the Design of 2-Color Microarray Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-9, January.
    17. Ambroise Jérôme & Bearzatto Bertrand & Robert Annie & Macq Benoit & Gala Jean-Luc, 2012. "Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-20, February.
    18. J. McClatchy & R. Strogantsev & E. Wolfe & H. Y. Lin & M. Mohammadhosseini & B. A. Davis & C. Eden & D. Goldman & W. H. Fleming & P. Conley & G. Wu & L. Cimmino & H. Mohammed & A. Agarwal, 2023. "Clonal hematopoiesis related TET2 loss-of-function impedes IL1β-mediated epigenetic reprogramming in hematopoietic stem and progenitor cells," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    19. Giovanni Compiani & Philip Haile & Marcelo Sant’Anna, 2020. "Common Values, Unobserved Heterogeneity, and Endogenous Entry in US Offshore Oil Lease Auctions," Journal of Political Economy, University of Chicago Press, vol. 128(10), pages 3872-3912.
    20. Alexandra Gyurdieva & Stefan Zajic & Ya-Fang Chang & E. Andres Houseman & Shan Zhong & Jaegil Kim & Michael Nathenson & Thomas Faitg & Mary Woessner & David C. Turner & Aisha N. Hasan & John Glod & Ro, 2022. "Biomarker correlates with response to NY-ESO-1 TCR T cells in patients with synovial sarcoma," Nature Communications, Nature, vol. 13(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:070:i02. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.