IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i12p4198-4209.html
   My bibliography  Save this article

Bayesian binary kernel probit model for microarray based cancer classification and gene selection

Author

Listed:
  • Chakraborty, Sounak

Abstract

With the arrival of gene expression microarrays a new challenge has opened up for identification or classification of cancer tissues. Due to the large number of genes providing valuable information simultaneously compared to very few available tissue samples the cancer staging or classification becomes very tricky. In this paper we introduce a hierarchical Bayesian probit model for two class cancer classification. Instead of assuming a linear structure for the function that relates the gene expressions with the cancer types we only assume that the relationship is explained by an unknown function which belongs to an abstract functional space like the reproducing kernel Hilbert space. Our formulation automatically reduces the dimension of the problem from the large number of covariates or genes to a small sample size. We incorporate a Bayesian gene selection scheme with the automatic dimension reduction to adaptively select important genes and classify cancer types under an unified model. Our model is highly flexible in terms of explaining the relationship between the cancer types and gene expression measurements and picking up the differentially expressed genes. The proposed model is successfully tested on three simulated data sets and three publicly available leukemia cancer, colon cancer, and prostate cancer real life data sets.

Suggested Citation

  • Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4198-4209
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00196-0
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Saravana M. Dhanasekaran & Terrence R. Barrette & Debashis Ghosh & Rajal Shah & Sooryanarayana Varambally & Kotoku Kurachi & Kenneth J. Pienta & Mark A. Rubin & Arul M. Chinnaiyan, 2001. "Delineation of prognostic biomarkers in prostate cancer," Nature, Nature, vol. 412(6849), pages 822-826, August.
    2. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    3. Malay Ghosh & Tapabrata Maiti & Dalho Kim & Sounak Chakraborty & Ashutosh Tewari, 2004. "Hierarchical Bayesian Neural Networks: An Application to a Prostate Cancer Study," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 601-608, January.
    4. Sinae Kim & Mahlet G. Tadesse & Marina Vannucci, 2006. "Variable selection in clustering via Dirichlet process mixture models," Biometrika, Biometrika Trust, vol. 93(4), pages 877-893, December.
    5. Naijun Sha & Marina Vannucci & Mahlet G. Tadesse & Philip J. Brown & Ilaria Dragoni & Nick Davies & Tracy C. Roberts & Andrea Contestabile & Mike Salmon & Chris Buckley & Francesco Falciani, 2004. "Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage," Biometrics, The International Biometric Society, vol. 60(3), pages 812-819, September.
    6. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    7. Tadesse, Mahlet G. & Sha, Naijun & Vannucci, Marina, 2005. "Bayesian Variable Selection in Clustering High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 602-617, June.
    8. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    9. Dawei Liu & Xihong Lin & Debashis Ghosh, 2007. "Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models," Biometrics, The International Biometric Society, vol. 63(4), pages 1079-1088, December.
    10. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    11. Zou, Hui & Yuan, Ming, 2008. "Regularized simultaneous model selection in multiple quantiles regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5296-5304, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aßmann, Christian & Boysen-Hogrefe, Jens, 2011. "A Bayesian approach to model-based clustering for binary panel probit models," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 261-279, January.
    2. Aijun Yang & Xuejun Jiang & Lianjie Shu & Jinguan Lin, 2017. "Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis," Computational Statistics, Springer, vol. 32(1), pages 127-143, March.
    3. Chakraborty, Sounak & Guo, Ruixin, 2011. "A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1342-1356, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chakraborty, Sounak, 2009. "Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1462-1474, February.
    2. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    3. Lee Kyu Ha & Chakraborty Sounak & Sun Jianguo, 2011. "Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-32, April.
    4. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    5. Brendan P. W. Ames & Mingyi Hong, 2016. "Alternating direction method of multipliers for penalized zero-variance discriminant analysis," Computational Optimization and Applications, Springer, vol. 64(3), pages 725-754, July.
    6. Ming Yi & Ruoqing Zhu & Robert M Stephens, 2018. "GradientScanSurv—An exhaustive association test method for gene expression data with censored survival outcome," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-28, December.
    7. Lian, Heng, 2010. "Sparse Bayesian hierarchical modeling of high-dimensional clustering problems," Journal of Multivariate Analysis, Elsevier, vol. 101(7), pages 1728-1737, August.
    8. Chakraborty, Sounak & Guo, Ruixin, 2011. "A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1342-1356, March.
    9. Wang, Tao & Zhu, Lixing, 2013. "Sparse sufficient dimension reduction using optimal scoring," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 223-232.
    10. Subharup Guha & Rex Jung & David Dunson, 2022. "Predicting phenotypes from brain connection structure," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(3), pages 639-668, June.
    11. Baragatti, M. & Pommeret, D., 2012. "A study of variable selection using g-prior distribution with ridge parameter," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1920-1934.
    12. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    13. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    14. Howard D. Bondell & Brian J. Reich, 2012. "Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1610-1624, December.
    15. Zeyu Diao & Lili Yue & Fanrong Zhao & Gaorong Li, 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates," Mathematics, MDPI, vol. 10(24), pages 1-18, December.
    16. Lee, Kuo-Jung & Feldkircher, Martin & Chen, Yi-Chi, 2021. "Variable selection in finite mixture of regression models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    17. Anindya Bhadra & Jyotishka Datta & Nicholas G. Polson & Brandon T. Willard, 2021. "The Horseshoe-Like Regularization for Feature Subset Selection," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 185-214, May.
    18. Shi, Guiling & Lim, Chae Young & Maiti, Tapabrata, 2019. "Bayesian model selection for generalized linear models using non-local priors," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 285-296.
    19. van Wieringen, Wessel N. & Kun, David & Hampel, Regina & Boulesteix, Anne-Laure, 2009. "Survival prediction using gene expression data: A review and comparison," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1590-1603, March.
    20. Chakraborty, Sounak & Lozano, Aurelie C., 2019. "A graph Laplacian prior for Bayesian variable selection and grouping," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 72-91.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4198-4209. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.