IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i4p1462-1474.html
   My bibliography  Save this article

Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach

Author

Listed:
  • Chakraborty, Sounak

Abstract

Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other "off the shelf" popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.

Suggested Citation

  • Chakraborty, Sounak, 2009. "Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1462-1474, February.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00472-6
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bani K. Mallick & Debashis Ghosh & Malay Ghosh, 2005. "Bayesian classification of tumours by using gene expression data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 219-234, April.
    2. Zou, Hui & Yuan, Ming, 2008. "Regularized simultaneous model selection in multiple quantiles regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5296-5304, August.
    3. W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
    4. Malay Ghosh & Tapabrata Maiti & Dalho Kim & Sounak Chakraborty & Ashutosh Tewari, 2004. "Hierarchical Bayesian Neural Networks: An Application to a Prostate Cancer Study," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 601-608, January.
    5. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    6. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    7. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    8. Naijun Sha & Marina Vannucci & Mahlet G. Tadesse & Philip J. Brown & Ilaria Dragoni & Nick Davies & Tracy C. Roberts & Andrea Contestabile & Mike Salmon & Chris Buckley & Francesco Falciani, 2004. "Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage," Biometrics, The International Biometric Society, vol. 60(3), pages 812-819, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nader Salari & Shamarina Shohaimi & Farid Najafi & Meenakshii Nallappan & Isthrinayagy Karishnarajah, 2014. "A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-50, November.
    2. Fraiman, Ricardo & Justel, Ana & Svarc, Marcela, 2010. "Pattern recognition via projection-based kNN rules," Computational Statistics & Data Analysis, Elsevier, vol. 54(5), pages 1390-1403, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    2. Katherine A. Guthrie & Lianne Sheppard & Jon Wakefield, 2002. "A Hierarchical Aggregate Data Model with Spatially Correlated Disease Rates," Biometrics, The International Biometric Society, vol. 58(4), pages 898-905, December.
    3. Zhuoqiong He & Dongchu Sun, 2000. "Hierarchical Bayes Estimation of Hunting Success Rates with Spatial Correlations," Biometrics, The International Biometric Society, vol. 56(2), pages 360-367, June.
    4. Lizhen Shen & Hua Jiang & Mingfang He & Guoqing Liu, 2017. "Collaborative representation-based classification of microarray gene expression data," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-14, December.
    5. Ngianga-Bakwin Kandala & Chibuzor Christopher Nnanatu & Glory Atilola & Paul Komba & Lubanzadio Mavatikua & Zhuzhi Moore & Gerry Mackie & Bettina Shell-Duncan, 2019. "A Spatial Analysis of the Prevalence of Female Genital Mutilation/Cutting among 0–14-Year-Old Girls in Kenya," IJERPH, MDPI, vol. 16(21), pages 1-28, October.
    6. Katherine Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    7. Pang, W. K. & Yang, Z. H. & Hou, S. H. & Leung, P. K., 2002. "Non-uniform random variate generation by the vertical strip method," European Journal of Operational Research, Elsevier, vol. 142(3), pages 595-609, November.
    8. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    9. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    10. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    11. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    12. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    13. Z. Rezaei Ghahroodi & M. Ganjali, 2013. "A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(7), pages 1425-1445, July.
    14. Anis Fradi & Chafik Samir & Ines Adouani, 2024. "A New Bayesian Approach to Global Optimization on Parametrized Surfaces in $$\mathbb {R}^{3}$$ R 3," Journal of Optimization Theory and Applications, Springer, vol. 202(3), pages 1077-1100, September.
    15. Strasak, Alexander M. & Umlauf, Nikolaus & Pfeiffer, Ruth M. & Lang, Stefan, 2011. "Comparing penalized splines and fractional polynomials for flexible modelling of the effects of continuous predictor variables," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1540-1551, April.
    16. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    17. Zhengyi Zhou & David S. Matteson & Dawn B. Woodard & Shane G. Henderson & Athanasios C. Micheas, 2015. "A Spatio-Temporal Point Process Model for Ambulance Demand," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 6-15, March.
    18. Eric C. Tassone & Marie Lynn Miranda & Alan E. Gelfand, 2010. "Disaggregated spatial modelling for areal unit categorical data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(1), pages 175-190, January.
    19. Junming Li & Xiulan Han & Xiao Li & Jianping Yang & Xuejiao Li, 2018. "Spatiotemporal Patterns of Ground Monitored PM 2.5 Concentrations in China in Recent Years," IJERPH, MDPI, vol. 15(1), pages 1-15, January.
    20. Antonello Loddo & Shawn Ni & Dongchu Sun, 2011. "Selection of Multivariate Stochastic Volatility Models via Bayesian Stochastic Search," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(3), pages 342-355, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.