IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p951-963.html
   My bibliography  Save this article

A general framework of nonparametric feature selection in high‐dimensional data

Author

Listed:
  • Hang Yu
  • Yuanjia Wang
  • Donglin Zeng

Abstract

Nonparametric feature selection for high‐dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.

Suggested Citation

  • Hang Yu & Yuanjia Wang & Donglin Zeng, 2023. "A general framework of nonparametric feature selection in high‐dimensional data," Biometrics, The International Biometric Society, vol. 79(2), pages 951-963, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963
    DOI: 10.1111/biom.13664
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13664
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13664?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yichao Wu & Leonard A. Stefanski, 2015. "Automatic structure recovery for additive models," Biometrika, Biometrika Trust, vol. 102(2), pages 381-395.
    2. L. A. Stefanski & Yichao Wu & Kyle White, 2014. "Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 574-589, June.
    3. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Fan, Jianqing & Feng, Yang & Song, Rui, 2011. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 544-557.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoshida, Takuma, 2018. "Semiparametric method for model structure discovery in additive regression models," Econometrics and Statistics, Elsevier, vol. 5(C), pages 124-136.
    2. Lin, Hongmei & Lian, Heng & Liang, Hua, 2019. "Rank reduction for high-dimensional generalized additive models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 672-684.
    3. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    4. Doksum, Kjell A. & Jiang, Jiancheng & Sun, Bo & Wang, Shuzhen, 2017. "Nearest neighbor estimates of regression," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 64-74.
    5. Kuang-Yao Lee & Bing Li & Hongyu Zhao, 2016. "Variable selection via additive conditional independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1037-1055, November.
    6. Umberto Amato & Anestis Antoniadis & Italia De Feis, 2016. "Additive model selection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(4), pages 519-564, November.
    7. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    8. Randy C. S. Lai & Jan Hannig & Thomas C. M. Lee, 2015. "Generalized Fiducial Inference for Ultrahigh-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 760-772, June.
    9. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    10. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    11. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    12. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    13. Dong, Yuexiao & Yu, Zhou & Zhu, Liping, 2020. "Model-free variable selection for conditional mean in regression," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    14. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Feature screening for case‐cohort studies with failure time outcome," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 349-370, March.
    15. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    16. Jiujing Wu & Hengjian Cui, 2024. "Model-free feature screening based on Hellinger distance for ultrahigh dimensional data," Statistical Papers, Springer, vol. 65(9), pages 5903-5930, December.
    17. Ma, Xuejun & Zhang, Jingxiao, 2016. "Robust model-free feature screening via quantile correlation," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 472-480.
    18. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    19. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    20. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.