IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0141874.html
   My bibliography  Save this article

Outcome-Driven Cluster Analysis with Application to Microarray Data

Author

Listed:
  • Jessie J Hsu
  • Dianne M Finkelstein
  • David A Schoenfeld

Abstract

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

Suggested Citation

  • Jessie J Hsu & Dianne M Finkelstein & David A Schoenfeld, 2015. "Outcome-Driven Cluster Analysis with Application to Microarray Data," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-15, November.
  • Handle: RePEc:plo:pone00:0141874
    DOI: 10.1371/journal.pone.0141874
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141874
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0141874&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0141874?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dettling, Marcel & Bühlmann, Peter, 2004. "Finding predictive gene groups from microarray data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 106-131, July.
    2. Howard D. Bondell & Brian J. Reich, 2008. "Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR," Biometrics, The International Biometric Society, vol. 64(1), pages 115-123, March.
    3. Dunson, David B. & Herring, Amy H. & Engel, Stephanie M., 2008. "Bayesian Selection and Clustering of Polymorphisms in Functionally Related Genes," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 534-546, June.
    4. Tadesse, Mahlet G. & Sha, Naijun & Vannucci, Marina, 2005. "Bayesian Variable Selection in Clustering High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 602-617, June.
    5. James G. Booth & George Casella & James P. Hobert, 2008. "Clustering using objective functions and stochastic search," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 119-139, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    2. Cui, Qiurong & Xu, Yuqing & Zhang, Zhengjun & Chan, Vincent, 2021. "Max-linear regression models with regularization," Journal of Econometrics, Elsevier, vol. 222(1), pages 579-600.
    3. Howard D. Bondell & Brian J. Reich, 2012. "Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1610-1624, December.
    4. Lee, Kuo-Jung & Feldkircher, Martin & Chen, Yi-Chi, 2021. "Variable selection in finite mixture of regression models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    5. Chakraborty, Sounak & Lozano, Aurelie C., 2019. "A graph Laplacian prior for Bayesian variable selection and grouping," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 72-91.
    6. Goodness C. Aye & Stephen M. Miller & Rangan Gupta & Mehmet Balcilar, 2016. "Forecasting US real private residential fixed investment using a large number of predictors," Empirical Economics, Springer, vol. 51(4), pages 1557-1580, December.
    7. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    8. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    9. Brian J. Reich & Howard D. Bondell, 2011. "A Spatial Dirichlet Process Mixture Model for Clustering Population Genetics Data," Biometrics, The International Biometric Society, vol. 67(2), pages 381-390, June.
    10. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2010. "Pairwise Variable Selection for High-Dimensional Model-Based Clustering," Biometrics, The International Biometric Society, vol. 66(3), pages 793-804, September.
    11. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    12. Nicholas Apergis & Ghassen El Montasser & Emmanuel Owusu-Sekyere & Ahdi N. Ajmi & Rangan Gupta, 2014. "Dutch Disease Effect of Oil Rents on Agriculture Value Added in MENA Countries," Working Papers 201408, University of Pretoria, Department of Economics.
    13. Ander Wilson & Brian J. Reich, 2014. "Confounder selection via penalized credible regions," Biometrics, The International Biometric Society, vol. 70(4), pages 852-861, December.
    14. Han, Shengtong & Zhang, Hongmei & Karmaus, Wilfried & Roberts, Graham & Arshad, Hasan, 2017. "Adjusting background noise in cluster analyses of longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 93-104.
    15. Wentao Qu & Xianchao Xiu & Huangyue Chen & Lingchen Kong, 2023. "A Survey on High-Dimensional Subspace Clustering," Mathematics, MDPI, vol. 11(2), pages 1-39, January.
    16. Hiraki, Kazuhiro & Sun, Chuanping, 2022. "A toolkit for exploiting contemporaneous stock correlations," Journal of Empirical Finance, Elsevier, vol. 65(C), pages 99-124.
    17. Zambom, Adriano Zanin & Akritas, Michael G., 2015. "Nonparametric significance testing and group variable selection," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 51-60.
    18. Korobilis, Dimitris, 2016. "Prior selection for panel vector autoregressions," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 110-120.
    19. Diebold, Francis X. & Shin, Minchul, 2019. "Machine learning for regularized survey forecast combination: Partially-egalitarian LASSO and its derivatives," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1679-1691.
    20. Germán Caruso & Walter Sosa-Escudero & Marcela Svarc, 2015. "Deprivation and the Dimensionality of Welfare: A Variable-Selection Cluster-Analysis Approach," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 61(4), pages 702-722, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0141874. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.