IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v109y2014i507p1229-1240.html
   My bibliography  Save this article

Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space

Author

Listed:
  • Shan Luo
  • Zehua Chen

Abstract

In this article, we propose a method called sequential Lasso (SLasso) for feature selection in sparse high-dimensional linear models. The SLasso selects features by sequentially solving partially penalized least squares problems where the features selected in earlier steps are not penalized. The SLasso uses extended BIC (EBIC) as the stopping rule. The procedure stops when EBIC reaches a minimum. The asymptotic properties of SLasso are considered when the dimension of the feature space is ultra high and the number of relevant feature diverges. We show that, with probability converging to 1, the SLasso first selects all the relevant features before any irrelevant features can be selected, and that the EBIC decreases until it attains the minimum at the model consisting of exactly all the relevant features and then begins to increase. These results establish the selection consistency of SLasso. The SLasso estimators of the final model are ordinary least squares estimators. The selection consistency implies the oracle property of SLasso. The asymptotic distribution of the SLasso estimators with diverging number of relevant features is provided. The SLasso is compared with other methods by simulation studies, which demonstrates that SLasso is a desirable approach having an edge over the other methods. The SLasso together with the other methods are applied to a microarray data for mapping disease genes. Supplementary materials for this article are available online.

Suggested Citation

  • Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
  • Handle: RePEc:taf:jnlasa:v:109:y:2014:i:507:p:1229-1240
    DOI: 10.1080/01621459.2013.877275
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2013.877275
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2013.877275?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    3. Unknown, 2004. "Table of Contents," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 7(4), pages 1-1.
    4. Unknown, 2004. "Table of Contents," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 7(2), pages 1-1.
    5. Wenxuan Zhong & Tingting Zhang & Yu Zhu & Jun S. Liu, 2012. "Correlation pursuit: forward stepwise variable selection for index models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(5), pages 849-870, November.
    6. Fan, Jianqing & Feng, Yang & Song, Rui, 2011. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 544-557.
    7. Kim, Yongdai & Choi, Hosik & Oh, Hee-Seok, 2008. "Smoothly Clipped Absolute Deviation on High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1665-1673.
    8. Jianqing Fan & Runze Li, 2004. "New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 710-723, January.
    9. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    10. Unknown, 2004. "Table of Contents," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 7(3), pages 1-1.
    11. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    12. Wang, Hansheng, 2009. "Forward Regression for Ultra-High Dimensional Variable Screening," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1512-1524.
    13. Unknown, 2004. "Table of Contents," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 7(1), pages 1-1.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zehua Chen & Yiwei Jiang, 2020. "A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(1), pages 65-90, February.
    2. Honda, Toshio & 本田, 敏雄 & Lin, Chien-Tong, 2022. "Forward variable selection for ultra-high dimensional quantile regression models," Discussion Papers 2021-02, Graduate School of Economics, Hitotsubashi University.
    3. Ke Yu & Shan Luo, 2022. "A sequential feature selection procedure for high-dimensional Cox proportional hazards model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(6), pages 1109-1142, December.
    4. Shan Luo, 2020. "Variable selection in high-dimensional sparse multiresponse linear regression models," Statistical Papers, Springer, vol. 61(3), pages 1245-1267, June.
    5. Luo, Shan & Chen, Zehua, 2020. "A procedure of linear discrimination analysis with detected sparsity structure for high-dimensional multi-class classification," Journal of Multivariate Analysis, Elsevier, vol. 179(C).
    6. Yu, Ke & Luo, Shan, 2024. "Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    7. Toshio Honda & Chien-Tong Lin, 2023. "Forward variable selection for ultra-high dimensional quantile regression models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(3), pages 393-424, June.
    8. Akira Shinkyu, 2023. "Forward Selection for Feature Screening and Structure Identification in Varying Coefficient Models," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 485-511, February.
    9. Eun Ryung Lee & Seyoung Park & Sang Kyu Lee & Hyokyoung G. Hong, 2023. "Quantile forward regression for high-dimensional survival data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(4), pages 769-806, October.
    10. Baolong Ying & Qijing Yan & Zehua Chen & Jinchao Du, 2024. "A sequential feature selection approach to change point detection in mean-shift change point models," Statistical Papers, Springer, vol. 65(6), pages 3893-3915, August.
    11. Liu, Jianyu & Yu, Guan & Liu, Yufeng, 2019. "Graph-based sparse linear discriminant analysis for high-dimensional classification," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 250-269.
    12. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    13. Peili Li & Min Liu & Zhou Yu, 2023. "A global two-stage algorithm for non-convex penalized high-dimensional linear regression problems," Computational Statistics, Springer, vol. 38(2), pages 871-898, June.
    14. Hong, Hyokyoung G. & Zheng, Qi & Li, Yi, 2019. "Forward regression for Cox models with high-dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 268-290.
    15. Yuyang Liu & Pengfei Pi & Shan Luo, 2023. "A semi-parametric approach to feature selection in high-dimensional linear regression models," Computational Statistics, Springer, vol. 38(2), pages 979-1000, June.
    16. Li, Peili & Jiao, Yuling & Lu, Xiliang & Kang, Lican, 2022. "A data-driven line search rule for support recovery in high-dimensional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    17. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    2. Canhong Wen & Xueqin Wang & Shaoli Wang, 2015. "Laplace Error Penalty-based Variable Selection in High Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 685-700, September.
    3. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    4. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    5. Dong, Yuexiao & Yu, Zhou & Zhu, Liping, 2020. "Model-free variable selection for conditional mean in regression," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    6. Zhou Yu & Yuexiao Dong & Li-Xing Zhu, 2016. "Trace Pursuit: A General Framework for Model-Free Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 813-821, April.
    7. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    8. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    9. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    10. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    11. Zhao, Bangxin & Liu, Xin & He, Wenqing & Yi, Grace Y., 2021. "Dynamic tilted current correlation for high dimensional variable screening," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    12. Haofeng Wang & Hongxia Jin & Xuejun Jiang & Jingzhi Li, 2022. "Model Selection for High Dimensional Nonparametric Additive Models via Ridge Estimation," Mathematics, MDPI, vol. 10(23), pages 1-22, December.
    13. Cui, Wenquan & Cheng, Haoyang & Sun, Jiajing, 2018. "An RKHS-based approach to double-penalized regression in high-dimensional partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 201-210.
    14. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    15. Xiang Zhang & Yichao Wu & Lan Wang & Runze Li, 2016. "Variable selection for support vector machines in moderately high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 53-76, January.
    16. He, Kevin & Kang, Jian & Hong, Hyokyoung G. & Zhu, Ji & Li, Yanming & Lin, Huazhen & Xu, Han & Li, Yi, 2019. "Covariance-insured screening," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 100-114.
    17. Hong, Hyokyoung G. & Zheng, Qi & Li, Yi, 2019. "Forward regression for Cox models with high-dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 268-290.
    18. Randy C. S. Lai & Jan Hannig & Thomas C. M. Lee, 2015. "Generalized Fiducial Inference for Ultrahigh-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 760-772, June.
    19. He, Xin & Mao, Xiaojun & Wang, Zhonglei, 2024. "Nonparametric augmented probability weighting with sparsity," Computational Statistics & Data Analysis, Elsevier, vol. 191(C).
    20. repec:wyi:journl:002212 is not listed on IDEAS
    21. Akira Shinkyu, 2023. "Forward Selection for Feature Screening and Structure Identification in Varying Coefficient Models," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 485-511, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:109:y:2014:i:507:p:1229-1240. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.