IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v52y2008i12p5356-5366.html
   My bibliography  Save this article

Assessing agreement of clustering methods with gene expression microarray data

Author

Listed:
  • Liu, Xueli
  • Lee, Sheng-Chien
  • Casella, George
  • Peter, Gary F.

Abstract

In the rapidly evolving field of genomics, many clustering and classification methods have been developed and employed to explore patterns in gene expression data. Biologists face the choice of which clustering algorithm(s) to use and how to interpret different results from various clustering algorithms. No clear objective criteria have been developed to assess the agreement and compare the results from different clustering methods. We describe two generally applicable objective measures to quantify agreement between different clustering methods. These two measures are referred to as the local agreement measure, which is defined for each gene/subject, and the global agreement measure, which is defined for the whole gene expression experiment. The agreement measures are based on a probabilistic weighting scheme applied to the number of concordant and discordant pairs from two clustering methods. In the comparison and assessment process, newly-developed concepts are implemented under the framework of reliability of a cluster. The algorithms are illustrated by simulations and then applied to a yeast sporulation gene expression microarray data. Analysis of the sporulation data identified ~5% (23 of 477) genes which were not consistently clustered using a neural net algorithm and K-means or pam. The two agreement measures provide objective criteria to conclude whether or not two clustering methods agree with each other. Using the local agreement measure, genes of unknown function which cluster consistently can more confidently be assigned functions based on co-regulation.

Suggested Citation

  • Liu, Xueli & Lee, Sheng-Chien & Casella, George & Peter, Gary F., 2008. "Assessing agreement of clustering methods with gene expression microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5356-5366, August.
  • Handle: RePEc:eee:csdana:v:52:y:2008:i:12:p:5356-5366
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00302-2
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. George C. Tseng & Wing H. Wong, 2005. "Tight Clustering: A Resampling-Based Approach for Identifying Stable and Tight Patterns in Data," Biometrics, The International Biometric Society, vol. 61(1), pages 10-16, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yang, Jingyun & Chinchilli, Vernon M., 2011. "Fixed-effects modeling of Cohen's weighted kappa for bivariate multinomial data," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1061-1070, February.
    2. Liu, Shen & Maharaj, Elizabeth Ann, 2013. "A hypothesis test using bias-adjusted AR estimators for classifying time series in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 32-49.
    3. Allison, David B. & Visscher, Peter M. & Rosa, Guilherme J.M. & Amos, Christopher I., 2009. "Statistical genetics & statistical genomics: Where biology, epistemology, statistics, and computation collide," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1531-1534, March.
    4. Liu, Shen & Maharaj, Elizabeth Ann & Inder, Brett, 2014. "Polarization of forecast densities: A new approach to time series classification," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 345-361.
    5. Marín, J.M. & Rodríguez-Bernal, M.T., 2012. "Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1898-1907.
    6. Douzal-Chouakria, Ahlame & Diallo, Alpha & Giroud, Françoise, 2009. "Adaptive clustering for time series: Application for identifying cell cycle expressed genes," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1414-1426, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yujia Li & Xiangrui Zeng & Chien‐Wei Lin & George C. Tseng, 2022. "Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis," Biometrics, The International Biometric Society, vol. 78(2), pages 574-585, June.
    2. He, Yi & Pan, Wei & Lin, Jizhen, 2006. "Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 641-658, November.
    3. Capobianco Enrico & Marras Elisabetta & Travaglione Antonella, 2011. "Multiscale Characterization of Signaling Network Dynamics through Features," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-32, November.
    4. Ming Yuan & Christina Kendziorski, 2006. "A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification," Biometrics, The International Biometric Society, vol. 62(4), pages 1089-1098, December.
    5. Davide Risso & Liam Purvis & Russell B Fletcher & Diya Das & John Ngai & Sandrine Dudoit & Elizabeth Purdom, 2018. "clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets," PLOS Computational Biology, Public Library of Science, vol. 14(9), pages 1-16, September.
    6. Coffey, N. & Hinde, J. & Holian, E., 2014. "Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 14-29.
    7. Yongsung Joo & George Casella & James Hobert, 2010. "Bayesian model-based tight clustering for time course data," Computational Statistics, Springer, vol. 25(1), pages 17-38, March.
    8. Hongkai Ji & Wing Hung Wong, 2006. "Computational Biology: Toward Deciphering Gene Regulatory Information in Mammalian Genomes," Biometrics, The International Biometric Society, vol. 62(3), pages 645-663, September.
    9. Liang, Faming, 2007. "Use of SVD-based probit transformation in clustering gene expression profiles," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6355-6366, August.
    10. Segal Mark R. & Xiong Hao & Bengtsson Henrik & Bourgon Richard & Gentleman Robert, 2012. "Querying Genomic Databases: Refining the Connectivity Map," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-37, January.
    11. Zhiguang Huo & Ying Ding & Silvia Liu & Steffi Oesterreich & George Tseng, 2016. "Meta-Analytic Framework for Sparse K -Means to Identify Disease Subtypes in Multiple Transcriptomic Studies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 27-42, March.
    12. Gupta, Mayetri, 2014. "An evolutionary Monte Carlo algorithm for Bayesian block clustering of data matrices," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 375-391.
    13. Feher Kristen & Whelan James & Müller Samuel, 2011. "Assessing Modularity Using a Random Matrix Theory Approach," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-34, September.
    14. Ranjan Maitra & Ivan P. Ramler, 2009. "Clustering in the Presence of Scatter," Biometrics, The International Biometric Society, vol. 65(2), pages 341-352, June.
    15. Tianzhou Ma & Faming Liang & George C. Tseng, 2017. "Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(4), pages 847-867, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:52:y:2008:i:12:p:5356-5366. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.