IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v7y2013i4p393-416.html
   My bibliography  Save this article

Multinomial logit models with implicit variable selection

Author

Listed:
  • Faisal Zahid
  • Gerhard Tutz

Abstract

The multinomial logit model is the most widely used model for the unordered multi-category responses. However, applications are typically restricted to the use of few predictors because in the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multi-category models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can also distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. The method is applied to the national Indonesia contraceptive prevalence survey and the identification of glass. Results are also compared with the Lasso approach which selects parameters. Copyright Springer-Verlag Berlin Heidelberg 2013

Suggested Citation

  • Faisal Zahid & Gerhard Tutz, 2013. "Multinomial logit models with implicit variable selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(4), pages 393-416, December.
  • Handle: RePEc:spr:advdac:v:7:y:2013:i:4:p:393-416
    DOI: 10.1007/s11634-013-0136-4
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s11634-013-0136-4
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s11634-013-0136-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    2. Buhlmann P. & Yu B., 2003. "Boosting With the L2 Loss: Regression and Classification," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 324-339, January.
    3. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    4. Jan Gertheiss & Gerhard Tutz, 2009. "Penalized Regression with Ordinal Predictors," International Statistical Review, International Statistical Institute, vol. 77(3), pages 345-365, December.
    5. Gerhard Tutz & Harald Binder, 2006. "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting," Biometrics, The International Biometric Society, vol. 62(4), pages 961-971, December.
    6. Faisal Zahid & Gerhard Tutz, 2013. "Ridge estimation for multinomial logit models with symmetric side constraints," Computational Statistics, Springer, vol. 28(3), pages 1017-1034, June.
    7. Tutz, Gerhard & Binder, Harald, 2007. "Boosting ridge regression," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6044-6059, August.
    8. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    9. Hans Nyquist, 1991. "Restricted Estimation of Generalized Linear Models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 40(1), pages 133-141, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Paz, Alexander & Arteaga, Cristian & Cobos, Carlos, 2019. "Specification of mixed logit models assisted by an optimization framework," Journal of choice modelling, Elsevier, vol. 30(C), pages 50-60.
    2. Moritz Berger & Thomas Welchowski & Steffen Schmitz-Valckenberg & Matthias Schmid, 2019. "A classification tree approach for the modeling of competing risks in discrete time," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 965-990, December.
    3. Faisal Maqbool Zahid & Gerhard Tutz, 2013. "Proportional Odds Models with High‐Dimensional Data Structure," International Statistical Review, International Statistical Institute, vol. 81(3), pages 388-406, December.
    4. Bayerstadler, Andreas & van Dijk, Linda & Winter, Fabian, 2016. "Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance," Insurance: Mathematics and Economics, Elsevier, vol. 71(C), pages 244-252.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    3. Faisal Maqbool Zahid & Gerhard Tutz, 2013. "Proportional Odds Models with High‐Dimensional Data Structure," International Statistical Review, International Statistical Institute, vol. 81(3), pages 388-406, December.
    4. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    5. Wei, Fengrong & Zhu, Hongxiao, 2012. "Group coordinate descent algorithms for nonconvex penalized regression," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 316-326.
    6. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    7. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    8. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    9. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    10. Gerhard Tutz & Jan Gertheiss, 2014. "Rating Scales as Predictors—The Old Question of Scale Level and Some Answers," Psychometrika, Springer;The Psychometric Society, vol. 79(3), pages 357-376, July.
    11. Yang, Yuehan & Xia, Siwei & Yang, Hu, 2023. "Multivariate sparse Laplacian shrinkage for joint estimation of two graphical structures," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
    12. Hess, Wolfgang & Persson, Maria & Rubenbauer, Stephanie & Gertheiss, Jan, 2013. "Using Lasso-Type Penalties to Model Time-Varying Covariate Effects in Panel Data Regressions – A Novel Approach Illustrated by the ‘Death of Distance’ in International Trade," Working Paper Series 961, Research Institute of Industrial Economics.
    13. Hendrik van der Wurp & Andreas Groll, 2023. "Introducing LASSO-type penalisation to generalised joint regression modelling for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 127-151, March.
    14. Yen, Tso-Jung & Yen, Yu-Min, 2016. "Structured variable selection via prior-induced hierarchical penalty functions," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 87-103.
    15. Fabian Scheipl & Thomas Kneib & Ludwig Fahrmeir, 2013. "Penalized likelihood and Bayesian function selection in regression models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 349-385, October.
    16. Lore Zumeta-Olaskoaga & Maximilian Weigert & Jon Larruskain & Eder Bikandi & Igor Setuain & Josean Lekue & Helmut Küchenhoff & Dae-Jin Lee, 2023. "Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 101-126, March.
    17. Matsui, Hidetoshi, 2014. "Variable and boundary selection for functional data via multiclass logistic regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 176-185.
    18. Sariyar Murat & Schumacher Martin & Binder Harald, 2014. "A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 343-357, June.
    19. Wanling Xie & Hu Yang, 2023. "Group sparse recovery via group square-root elastic net and the iterative multivariate thresholding-based algorithm," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(3), pages 469-507, September.
    20. Chen, Shunjie & Yang, Sijia & Wang, Pei & Xue, Liugen, 2023. "Two-stage penalized algorithms via integrating prior information improve gene selection from omics data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 628(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:7:y:2013:i:4:p:393-416. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.