IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v14y2015i5p413-428n5.html
   My bibliography  Save this article

A model selection criterion for model-based clustering of annotated gene expression data

Author

Listed:
  • Gallopin Mélina

    (Laboratoire de Mathématiques, UMR 8628, Université Paris-Sud, 91405, Orsay Cedex, France INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France)

  • Celeux Gilles

    (Inria Saclay Île-de-France, Université Paris-Sud, 91405, Orsay Cedex, France)

  • Jaffrézic Florence

    (INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, 75231 Paris, France)

  • Rau Andrea

    (INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, 75231 Paris, France)

Abstract

In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the integrated completed annotated likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.

Suggested Citation

  • Gallopin Mélina & Celeux Gilles & Jaffrézic Florence & Rau Andrea, 2015. "A model selection criterion for model-based clustering of annotated gene expression data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(5), pages 413-428, November.
  • Handle: RePEc:bpj:sagmbi:v:14:y:2015:i:5:p:413-428:n:5
    DOI: 10.1515/sagmb-2014-0095
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2014-0095
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2014-0095?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard & Langrognet, Florent, 2006. "Model-based cluster and discriminant analysis with the MIXMOD software," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 587-600, November.
    2. Smyth Gordon K, 2004. "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-28, February.
    3. Ian Holmes & Keith Harris & Christopher Quince, 2012. "Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-15, February.
    4. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    5. Lebret, Rémi & Iovleff, Serge & Langrognet, Florent & Biernacki, Christophe & Celeux, Gilles & Govaert, Gérard, 2015. "Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i06).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lebret, Rémi & Iovleff, Serge & Langrognet, Florent & Biernacki, Christophe & Celeux, Gilles & Govaert, Gérard, 2015. "Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i06).
    2. Semhar Michael & Volodymyr Melnykov, 2016. "An effective strategy for initializing the EM algorithm in finite mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 563-583, December.
    3. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    4. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    5. Hasnat, Md. Abul & Velcin, Julien & Bonnevay, Stephane & Jacques, Julien, 2017. "Evolutionary clustering for categorical data using parametric links among multinomial mixture models," Econometrics and Statistics, Elsevier, vol. 3(C), pages 141-159.
    6. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    7. Aaron C Ericsson & J Wade Davis & William Spollen & Nathan Bivens & Scott Givan & Catherine E Hagan & Mark McIntosh & Craig L Franklin, 2015. "Effects of Vendor and Genetic Background on the Composition of the Fecal Microbiota of Inbred Mice," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-19, February.
    8. Adrian O’Hagan & Arthur White, 2019. "Improved model-based clustering performance using Bayesian initialization averaging," Computational Statistics, Springer, vol. 34(1), pages 201-231, March.
    9. Amanda F. Mejia, 2022. "Discussion on “distributional independent component analysis for diverse neuroimaging modalities” by Ben Wu, Subhadip Pal, Jian Kang, and Ying Guo," Biometrics, The International Biometric Society, vol. 78(3), pages 1109-1112, September.
    10. Achal Dhariwal & Polona Rajar & Gabriela Salvadori & Heidi Aarø Åmdal & Dag Berild & Ola Didrik Saugstad & Drude Fugelseth & Gorm Greisen & Ulf Dahle & Kirsti Haaland & Fernanda Cristina Petersen, 2024. "Prolonged hospitalization signature and early antibiotic effects on the nasopharyngeal resistome in preterm infants," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    11. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    12. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    13. Xiaohong Li & Guy N Brock & Eric C Rouchka & Nigel G F Cooper & Dongfeng Wu & Timothy E O’Toole & Ryan S Gill & Abdallah M Eteleeb & Liz O’Brien & Shesh N Rai, 2017. "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    14. Laura Anderlucci & Cinzia Viroli, 2020. "Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 759-770, December.
    15. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    16. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    17. Kerr Kathleen F., 2012. "Optimality Criteria for the Design of 2-Color Microarray Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-9, January.
    18. Ambroise Jérôme & Bearzatto Bertrand & Robert Annie & Macq Benoit & Gala Jean-Luc, 2012. "Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-20, February.
    19. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    20. J. McClatchy & R. Strogantsev & E. Wolfe & H. Y. Lin & M. Mohammadhosseini & B. A. Davis & C. Eden & D. Goldman & W. H. Fleming & P. Conley & G. Wu & L. Cimmino & H. Mohammed & A. Agarwal, 2023. "Clonal hematopoiesis related TET2 loss-of-function impedes IL1β-mediated epigenetic reprogramming in hematopoietic stem and progenitor cells," Nature Communications, Nature, vol. 14(1), pages 1-17, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:14:y:2015:i:5:p:413-428:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.