IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v133y2019icp1-19.html
   My bibliography  Save this article

A novel variational Bayesian method for variable selection in logistic regression models

Author

Listed:
  • Zhang, Chun-Xia
  • Xu, Shuang
  • Zhang, Jiang-She

Abstract

With high-dimensional data emerging in various domains, sparse logistic regression models have gained much interest of researchers. Variable selection plays a key role in both improving the prediction accuracy and enhancing the interpretability of built models. Bayesian variable selection approaches enjoy many advantages such as high selection accuracy, easily incorporating many kinds of prior knowledge and so on. Because Bayesian methods generally make inference from the posterior distribution with Markov Chain Monte Carlo (MCMC) techniques, however, they become intractable in high-dimensional situations due to the large searching space. To address this issue, a novel variational Bayesian method for variable selection in high-dimensional logistic regression models is presented. The proposed method is based on the indicator model in which each covariate is equipped with a binary latent variable indicating whether it is important. The Bernoulli-type prior is adopted for the latent indicator variable. As for the specification of the hyperparameter in the Bernoulli prior, we provide two schemes to determine its optimal value so that the novel model can achieve sparsity adaptively. To identify important variables and make predictions, one efficient variational Bayesian approach is employed to make inference from the posterior distribution. The experiments conducted with both synthetic and some publicly available data show that the new method outperforms or is very competitive with some other popular counterparts.

Suggested Citation

  • Zhang, Chun-Xia & Xu, Shuang & Zhang, Jiang-She, 2019. "A novel variational Bayesian method for variable selection in logistic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 1-19.
  • Handle: RePEc:eee:csdana:v:133:y:2019:i:c:p:1-19
    DOI: 10.1016/j.csda.2018.08.025
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318302081
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2018.08.025?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    3. Veronika Ročková & Edward I. George, 2014. "EMVS: The EM Approach to Bayesian Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 828-846, June.
    4. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    5. Tian, Guo-Liang & Tang, Man-Lai & Fang, Hong-Bin & Tan, Ming, 2008. "Efficient methods for estimating constrained parameters with applications to regularized (lasso) logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3528-3542, March.
    6. David Rossell & Francisco J. Rubio, 2018. "Tractable Bayesian Variable Selection: Beyond Normality," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1742-1758, October.
    7. David Rossell & Donatello Telesca, 2017. "Nonlocal Priors for High-Dimensional Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 254-265, January.
    8. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    9. Latouche, Pierre & Mattei, Pierre-Alexandre & Bouveyron, Charles & Chiquet, Julien, 2016. "Combining a relaxed EM algorithm with Occam’s razor for Bayesian variable selection in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 177-190.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shan Feng & Wenxian Xie & Yufeng Nie, 2024. "Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers," Mathematics, MDPI, vol. 12(7), pages 1-23, April.
    2. Lai, Wei-Ting & Chen, Ray-Bing & Chen, Ying & Koch, Thorsten, 2022. "Variational Bayesian inference for network autoregression models," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    3. Li, Xiaowei & Tang, Junqing & Hu, Xiaojiao & Wang, Wei, 2020. "Assessing intercity multimodal choice behavior in a Touristy City: A factor analysis," Journal of Transport Geography, Elsevier, vol. 86(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    2. Qi Zhang & Yihui Zhang & Yemao Xia, 2024. "Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations," Mathematics, MDPI, vol. 12(5), pages 1-23, March.
    3. Dufays, Arnaud & Rombouts, Jeroen V.K., 2020. "Relevant parameter changes in structural break models," Journal of Econometrics, Elsevier, vol. 217(1), pages 46-78.
    4. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    5. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    6. Uddin, Md Nazir & Gaskins, Jeremy T., 2023. "Shared Bayesian variable shrinkage in multinomial logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    7. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    8. M. Marsman & K. Huth & L. J. Waldorp & I. Ntzoufras, 2022. "Objective Bayesian Edge Screening and Structure Selection for Ising Networks," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 47-82, March.
    9. Nicholas G. Polson & James G. Scott, 2016. "Mixtures, envelopes and hierarchical duality," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 701-727, September.
    10. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    11. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    12. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    13. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    14. Jean-Pierre Dubé & Sanjog Misra, 2017. "Personalized Pricing and Consumer Welfare," NBER Working Papers 23775, National Bureau of Economic Research, Inc.
    15. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    16. Dimitris Korobilis & Davide Pettenuzzo, 2020. "Machine Learning Econometrics: Bayesian algorithms and methods," Working Papers 2020_09, Business School - Economics, University of Glasgow.
    17. Shutes, Karl & Adcock, Chris, 2013. "Regularized Extended Skew-Normal Regression," MPRA Paper 58445, University Library of Munich, Germany, revised 09 Sep 2014.
    18. Bansal, Prateek & Krueger, Rico & Graham, Daniel J., 2021. "Fast Bayesian estimation of spatial count data models," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    19. De Luca, Giuseppe & Magnus, Jan R. & Peracchi, Franco, 2018. "Weighted-average least squares estimation of generalized linear models," Journal of Econometrics, Elsevier, vol. 204(1), pages 1-17.
    20. Yu-Zhu Tian & Man-Lai Tang & Wai-Sum Chan & Mao-Zai Tian, 2021. "Bayesian bridge-randomized penalized quantile regression for ordinal longitudinal data, with application to firm’s bond ratings," Computational Statistics, Springer, vol. 36(2), pages 1289-1319, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:133:y:2019:i:c:p:1-19. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.