IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i3p1342-1356.html
   My bibliography  Save this article

A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data

Author

Listed:
  • Chakraborty, Sounak
  • Guo, Ruixin

Abstract

A hybrid Huberized support vector machine (HHSVM) with an elastic-net penalty has been developed for cancer tumor classification based on thousands of gene expression measurements. In this paper, we develop a Bayesian formulation of the hybrid Huberized support vector machine for binary classification. For the coefficients of the linear classification boundary, we propose a new type of prior, which can select variables and group them together simultaneously. Our proposed prior is a scale mixture of normal distributions and independent gamma priors on a transformation of the variance of the normal distributions. We establish a direct connection between the Bayesian HHSVM model with our special prior and the standard HHSVM solution with the elastic-net penalty. We propose a hierarchical Bayes technique and an empirical Bayes technique to select the penalty parameter. In the hierarchical Bayes model, the penalty parameter is selected using a beta prior. For the empirical Bayes model, we estimate the penalty parameter by maximizing the marginal likelihood. The proposed model is applied to two simulated data sets and three real-life gene expression microarray data sets. Results suggest that our Bayesian models are highly successful in selecting groups of similarly behaved important genes and predicting the cancer class. Most of the genes selected by our models have shown strong association with well-studied genetic pathways, further validating our claims.

Suggested Citation

  • Chakraborty, Sounak & Guo, Ruixin, 2011. "A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1342-1356, March.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:3:p:1342-1356
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00367-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    2. Lee, Yoonkyung & Lin, Yi & Wahba, Grace, 2004. "Multicategory Support Vector Machines: Theory and Application to the Classification of Microarray Data and Satellite Radiance Data," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 67-81, January.
    3. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    4. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    5. Tibshirani Robert J., 2009. "Univariate Shrinkage in the Cox Model for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, April.
    6. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    7. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pedro Duarte Silva, A., 2011. "Two-group classification with high-dimensional correlated data: A factor model approach," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2975-2990, November.
    2. Mallick, Himel & Yi, Nengjun, 2017. "Bayesian group bridge for bi-level variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 115-133.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lee, Kyu Ha & Chakraborty, Sounak & Sun, Jianguo, 2017. "Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 1-13.
    2. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    3. Lee Kyu Ha & Chakraborty Sounak & Sun Jianguo, 2011. "Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-32, April.
    4. Yu-Zhu Tian & Man-Lai Tang & Wai-Sum Chan & Mao-Zai Tian, 2021. "Bayesian bridge-randomized penalized quantile regression for ordinal longitudinal data, with application to firm’s bond ratings," Computational Statistics, Springer, vol. 36(2), pages 1289-1319, June.
    5. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    6. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    7. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    8. Brendan P. W. Ames & Mingyi Hong, 2016. "Alternating direction method of multipliers for penalized zero-variance discriminant analysis," Computational Optimization and Applications, Springer, vol. 64(3), pages 725-754, July.
    9. Korobilis, Dimitris, 2013. "Hierarchical shrinkage priors for dynamic regressions with many predictors," International Journal of Forecasting, Elsevier, vol. 29(1), pages 43-59.
    10. Philip Kostov & Thankom Arun & Samuel Annim, 2014. "Financial Services to the Unbanked: the case of the Mzansi intervention in South Africa," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 8(2), June.
    11. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    12. Olivier Collignon & Jeongseop Han & Hyungmi An & Seungyoung Oh & Youngjo Lee, 2018. "Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-15, October.
    13. Park, Seongoh & Kim, Joungyoun & Wang, Xinlei & Lim, Johan, 2024. "Variable selection in Bayesian multiple instance regression using shotgun stochastic search," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).
    14. Gilles Charmet & Louis-Gautier Tran & Jérôme Auzanneau & Renaud Rincent & Sophie Bouchet, 2020. "BWGS: A R package for genomic selection and its application to a wheat breeding programme," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-20, April.
    15. Scutari Marco & Mackay Ian & Balding David, 2013. "Improving the efficiency of genomic selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 517-527, August.
    16. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    17. Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
    18. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    19. Christis Katsouris, 2023. "High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods," Papers 2308.16192, arXiv.org.
    20. Roberto Casarin & Fausto Corradin & Francesco Ravazzolo & Nguyen Domenico Sartore & Wing-Keung Wong, 2020. "A Scoring Rule for Factor and Autoregressive Models Under Misspecification," Advances in Decision Sciences, Asia University, Taiwan, vol. 24(2), pages 66-103, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:3:p:1342-1356. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.