IDEAS home Printed from https://ideas.repec.org/a/wly/canjec/v48y2015i2p389-407.html
   My bibliography  Save this article

Variable selection and estimation in high‐dimensional models

Author

Listed:
  • Joel L. Horowitz

Abstract

Models with high‐dimensional covariates arise frequently in economics and other fields. Often, only a few covariates have important effects on the dependent variable. When this happens, the model is said to be sparse. In applications, however, it is not known which covariates are important and which are not. This paper reviews methods for discriminating between important and unimportant covariates with particular attention given to methods that discriminate correctly with probability approaching 1 as the sample size increases. Methods are available for a wide variety of linear, nonlinear, semiparametric and nonparametric models. The performance of some of these methods in finite samples is illustrated through Monte Carlo simulations and an empirical example. Sélection des variables et calibration des modèles de grande dimension. Les modèles où les covariables sont de grande dimension émergent fréquemment en économie et dans d'autres champs. Souvent seulement quelques covariables ont un effet significatif sur la variable indépendante. Quand cela se produit, on dit du modèle qu'il est parcimonieux. Dans les applications, cependant, on ne sait pas lesquelles covariables sont importantes et lesquelles ne le sont pas. Ce texte passe en revue des méthodes de discrimination entre variables importantes ou non, en portant une attention particulière aux méthodes qui discriminent correctement avec une probabilité qui s'approche de 1 quand la taille de l'échantillon s'accroît. Des méthodes sont disponibles pour une grande variété de modèles – linéaires, non‐linéaires, semi‐paramétriques et non‐paramétriques. La performance de certaines de ces méthodes pour des échantillons finis est illustrée à l'aide de simulations de Monte Carlo et d'un exemple empirique.

Suggested Citation

  • Joel L. Horowitz, 2015. "Variable selection and estimation in high‐dimensional models," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 48(2), pages 389-407, May.
  • Handle: RePEc:wly:canjec:v:48:y:2015:i:2:p:389-407
    DOI: 10.1111/caje.12130
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/caje.12130
    Download Restriction: no

    File URL: https://libkey.io/10.1111/caje.12130?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Horowitz, Joel L. & Lee, Sokbae, 2005. "Nonparametric Estimation of an Additive Quantile Regression Model," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1238-1249, December.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    4. Kim, Yongdai & Choi, Hosik & Oh, Hee-Seok, 2008. "Smoothly Clipped Absolute Deviation on High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1665-1673.
    5. Jing Wang & Lijian Yang, 2009. "Efficient and fast spline-backfitted kernel smoothing of additive models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 61(3), pages 663-690, September.
    6. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    7. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    8. Powell, James L & Stock, James H & Stoker, Thomas M, 1989. "Semiparametric Estimation of Index Coefficients," Econometrica, Econometric Society, vol. 57(6), pages 1403-1430, November.
    9. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    10. Efang Kong & Yingcun Xia, 2007. "Variable selection for the single‐index model," Biometrika, Biometrika Trust, vol. 94(1), pages 217-229.
    11. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Horowitz, Joel L. & Nesheim, Lars, 2021. "Using penalized likelihood to select parameters in a random coefficients multinomial logit model," Journal of Econometrics, Elsevier, vol. 222(1), pages 44-55.
    2. Adriano Zanin Zambom & Gregory J. Matthews, 2021. "Sure independence screening in the presence of missing data," Statistical Papers, Springer, vol. 62(2), pages 817-845, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," Canadian Journal of Economics, Canadian Economics Association, vol. 48(2), pages 389-407, May.
    2. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers CWP35/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers 35/15, Institute for Fiscal Studies.
    4. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    5. Tan, Xin Lu, 2019. "Optimal estimation of slope vector in high-dimensional linear transformation models," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 179-204.
    6. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    7. Lian, Heng & Li, Jianbo & Tang, Xingyu, 2014. "SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 50-64.
    8. Yanxin Wang & Qibin Fan & Li Zhu, 2018. "Variable selection and estimation using a continuous approximation to the $$L_0$$ L 0 penalty," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 191-214, February.
    9. Jianbo Li & Yuan Li & Riquan Zhang, 2017. "B spline variable selection for the single index models," Statistical Papers, Springer, vol. 58(3), pages 691-706, September.
    10. Guo-Liang Tian & Mingqiu Wang & Lixin Song, 2014. "Variable selection in the high-dimensional continuous generalized linear model with current status data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 467-483, March.
    11. Lian, Heng, 2012. "A note on the consistency of Schwarz’s criterion in linear quantile regression with the SCAD penalty," Statistics & Probability Letters, Elsevier, vol. 82(7), pages 1224-1228.
    12. Sunghoon Kwon & Jeongyoun Ahn & Woncheol Jang & Sangin Lee & Yongdai Kim, 2017. "A doubly sparse approach for group variable selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(5), pages 997-1025, October.
    13. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    14. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    15. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    16. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    17. Lan Wang & Yichao Wu & Runze Li, 2012. "Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 214-222, March.
    18. Yongjin Li & Qingzhao Zhang & Qihua Wang, 2017. "Penalized estimation equation for an extended single-index model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 169-187, February.
    19. Canhong Wen & Xueqin Wang & Shaoli Wang, 2015. "Laplace Error Penalty-based Variable Selection in High Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 685-700, September.
    20. Joel L. Horowitz & Lars Nesheim, 2018. "Using penalized likelihood to select parameters in a random coefficients multinomial logit model," CeMMAP working papers CWP29/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:canjec:v:48:y:2015:i:2:p:389-407. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1111/(ISSN)1540-5982 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.