IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p951-963.html
   My bibliography  Save this article

A general framework of nonparametric feature selection in high‐dimensional data

Author

Listed:
  • Hang Yu
  • Yuanjia Wang
  • Donglin Zeng

Abstract

Nonparametric feature selection for high‐dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.

Suggested Citation

  • Hang Yu & Yuanjia Wang & Donglin Zeng, 2023. "A general framework of nonparametric feature selection in high‐dimensional data," Biometrics, The International Biometric Society, vol. 79(2), pages 951-963, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963
    DOI: 10.1111/biom.13664
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13664
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13664?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yichao Wu & Leonard A. Stefanski, 2015. "Automatic structure recovery for additive models," Biometrika, Biometrika Trust, vol. 102(2), pages 381-395.
    2. L. A. Stefanski & Yichao Wu & Kyle White, 2014. "Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 574-589, June.
    3. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoshida, Takuma, 2018. "Semiparametric method for model structure discovery in additive regression models," Econometrics and Statistics, Elsevier, vol. 5(C), pages 124-136.
    2. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    3. Lin, Hongmei & Lian, Heng & Liang, Hua, 2019. "Rank reduction for high-dimensional generalized additive models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 672-684.
    4. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    5. Nardi, Y. & Rinaldo, A., 2011. "Autoregressive process modeling via the Lasso procedure," Journal of Multivariate Analysis, Elsevier, vol. 102(3), pages 528-549, March.
    6. Doksum, Kjell A. & Jiang, Jiancheng & Sun, Bo & Wang, Shuzhen, 2017. "Nearest neighbor estimates of regression," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 64-74.
    7. Kyle R. White & Leonard A. Stefanski & Yichao Wu, 2017. "Variable Selection in Kernel Regression Using Measurement Error Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1587-1597, October.
    8. Li Liu & Hao Wang & Yanyan Liu & Jian Huang, 2021. "Model pursuit and variable selection in the additive accelerated failure time model," Statistical Papers, Springer, vol. 62(6), pages 2627-2659, December.
    9. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    10. Bhatnagar, Sahir R. & Lu, Tianyuan & Lovato, Amanda & Olds, David L. & Kobor, Michael S. & Meaney, Michael J. & O'Donnell, Kieran & Yang, Archer Y. & Greenwood, Celia M.T., 2023. "A sparse additive model for high-dimensional interactions with an exposure variable," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    11. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    12. Takuma Yoshida, 2019. "Two stage smoothing in additive models with missing covariates," Statistical Papers, Springer, vol. 60(6), pages 1803-1826, December.
    13. Kuang-Yao Lee & Bing Li & Hongyu Zhao, 2016. "Variable selection via additive conditional independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1037-1055, November.
    14. Zhu, Ying, 2015. "Sparse Linear Models and l1−Regularized 2SLS with High-Dimensional Endogenous Regressors and Instruments," MPRA Paper 81217, University Library of Munich, Germany.
    15. Umberto Amato & Anestis Antoniadis & Italia De Feis, 2016. "Additive model selection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(4), pages 519-564, November.
    16. He, Yong & Zhang, Xinsheng & Zhang, Liwen, 2018. "Variable selection for high dimensional Gaussian copula regression model: An adaptive hypothesis testing procedure," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 132-150.
    17. Morteza Amini & Mahdi Roozbeh, 2019. "Improving the prediction performance of the LASSO by subtracting the additive structural noises," Computational Statistics, Springer, vol. 34(1), pages 415-432, March.
    18. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    19. Shaogao Lv & Xin He & Junhui Wang, 2017. "A unified penalized method for sparse additive quantile models: an RKHS approach," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(4), pages 897-923, August.
    20. Hyung Park & Thaddeus Tarpey & Eva Petkova & R. Todd Ogden, 2024. "A high-dimensional single-index regression for interactions between treatment and covariates," Statistical Papers, Springer, vol. 65(7), pages 4025-4056, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.