IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v35y2023i5p1044-1060.html
   My bibliography  Save this article

Simultaneous Dimension Reduction and Variable Selection for Multinomial Logistic Regression

Author

Listed:
  • Canhong Wen

    (International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China)

  • Zhenduo Li

    (International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China)

  • Ruipeng Dong

    (International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China)

  • Yijin Ni

    (Industrial and System Engineering, Georgia Institute of Technology, 30318 Atlanta, Georgia)

  • Wenliang Pan

    (Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China)

Abstract

Multinomial logistic regression is a useful model for predicting the probabilities of multiclass outcomes. Because of the complexity and high dimensionality of some data, it is challenging to fit a valid model with high accuracy and interpretability. We propose a novel sparse reduced-rank multinomial logistic regression model to jointly select variables and reduce the dimension via a nonconvex row constraint. We develop a block-wise iterative algorithm with a majorizing surrogate function to efficiently solve the optimization problem. From an algorithmic aspect, we show that the output estimator enjoys consistency in estimation and sparsity recovery even in a high-dimensional setting. The finite sample performance of the proposed method is investigated via simulation studies and two real image data sets. The results show that our proposal has competitive performance in both estimation accuracy and computation time.

Suggested Citation

  • Canhong Wen & Zhenduo Li & Ruipeng Dong & Yijin Ni & Wenliang Pan, 2023. "Simultaneous Dimension Reduction and Variable Selection for Multinomial Logistic Regression," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1044-1060, September.
  • Handle: RePEc:inm:orijoc:v:35:y:2023:i:5:p:1044-1060
    DOI: 10.1287/ijoc.2022.0132
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2022.0132
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2022.0132?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Daehan Won & Hasan Manzour & Wanpracha Chaovalitwongse, 2020. "Convex Optimization for Group Feature Selection in Networked Data," INFORMS Journal on Computing, INFORMS, vol. 32(1), pages 182-198, January.
    3. Lan Wang & Bo Peng & Jelena Bradic & Runze Li & Yunan Wu, 2020. "A Tuning-free Robust and Efficient Approach to High-dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1700-1714, December.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Izenman, Alan Julian, 1975. "Reduced-rank regression for the multivariate linear model," Journal of Multivariate Analysis, Elsevier, vol. 5(2), pages 248-264, June.
    6. Lisha Chen & Jianhua Z. Huang, 2012. "Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1533-1545, December.
    7. Yiyuan She, 2017. "Selective factor extraction in high dimensions," Biometrika, Biometrika Trust, vol. 104(1), pages 97-110.
    8. Lan Wang & Bo Peng & Jelena Bradic & Runze Li & Yunan Wu, 2020. "Rejoinder to “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1726-1729, December.
    9. Jianqing Fan & Cong Ma & Kaizheng Wang, 2020. "Comment on “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1720-1725, December.
    10. Vincent, Martin & Hansen, Niels Richard, 2014. "Sparse group lasso and high dimensional multinomial classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 771-786.
    11. Yingying Fan & Jinchi Lv, 2014. "Asymptotic properties for combined L1 and concave regularization," Biometrika, Biometrika Trust, vol. 101(1), pages 57-70.
    12. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    13. Xiaoqun Wang, 2009. "Dimension Reduction Techniques in Quasi-Monte Carlo Methods for Option Pricing," INFORMS Journal on Computing, INFORMS, vol. 21(3), pages 488-504, August.
    14. Zemin Zheng & Yingying Fan & Jinchi Lv, 2014. "High dimensional thresholded regression and shrinkage effect," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 627-649, June.
    15. Nan Liu & Yuhang Ma & Huseyin Topaloglu, 2020. "Assortment Optimization Under the Multinomial Logit Model with Sequential Offerings," INFORMS Journal on Computing, INFORMS, vol. 32(3), pages 835-853, July.
    16. Matthew S. Maxwell & Mateo Restrepo & Shane G. Henderson & Huseyin Topaloglu, 2010. "Approximate Dynamic Programming for Ambulance Redeployment," INFORMS Journal on Computing, INFORMS, vol. 22(2), pages 266-281, May.
    17. Xiudi Li & Ali Shojaie, 2020. "Discussion of “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1717-1719, December.
    18. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Jack Jewson & David Rossell, 2022. "General Bayesian loss function selection and the use of improper models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1640-1665, November.
    3. Goh, Gyuhyeong & Dey, Dipak K. & Chen, Kun, 2017. "Bayesian sparse reduced rank multivariate regression," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 14-28.
    4. Ziping Zhao & Daniel P. Palomar, 2018. "Sparse Reduced Rank Regression With Nonconvex Regularization," Papers 1803.07247, arXiv.org.
    5. Lian, Heng & Kim, Yongdai, 2016. "Nonconvex penalized reduced rank regression and its oracle properties in high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 383-393.
    6. Mingyang Ren & Sanguo Zhang & Junhui Wang, 2023. "Consistent estimation of the number of communities via regularized network embedding," Biometrics, The International Biometric Society, vol. 79(3), pages 2404-2416, September.
    7. Yu, Ke & Luo, Shan, 2024. "Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    8. Yuyang Liu & Pengfei Pi & Shan Luo, 2023. "A semi-parametric approach to feature selection in high-dimensional linear regression models," Computational Statistics, Springer, vol. 38(2), pages 979-1000, June.
    9. Lian, Heng & Feng, Sanying & Zhao, Kaifeng, 2015. "Parametric and semiparametric reduced-rank regression with flexible sparsity," Journal of Multivariate Analysis, Elsevier, vol. 136(C), pages 163-174.
    10. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    11. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    12. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    13. A. Karagrigoriou & C. Koukouvinos & K. Mylona, 2010. "On the advantages of the non-concave penalized likelihood model selection method with minimum prediction errors in large-scale medical studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(1), pages 13-24.
    14. Dong, Ruipeng & Li, Daoji & Zheng, Zemin, 2021. "Parallel integrative learning for large-scale multi-response regression with incomplete outcomes," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    15. Lichun Wang & Yuan You & Heng Lian, 2015. "Convergence and sparsity of Lasso and group Lasso in high-dimensional generalized linear models," Statistical Papers, Springer, vol. 56(3), pages 819-828, August.
    16. Luke Mosley & Idris A. Eckley & Alex Gibberd, 2022. "Sparse temporal disaggregation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 2203-2233, October.
    17. Xing Gao & Sungwon Lee & Gen Li & Sungkyu Jung, 2021. "Covariate‐driven factorization by thresholding for multiblock data," Biometrics, The International Biometric Society, vol. 77(3), pages 1011-1023, September.
    18. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    19. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    20. Yanfang Zhang & Chuanhua Wei & Xiaolin Liu, 2022. "Group Logistic Regression Models with l p,q Regularization," Mathematics, MDPI, vol. 10(13), pages 1-15, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:35:y:2023:i:5:p:1044-1060. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.