IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i17p3706-d1227466.html
   My bibliography  Save this article

Generalized Penalized Constrained Regression: Sharp Guarantees in High Dimensions with Noisy Features

Author

Listed:
  • Ayed M. Alrashdi

    (Department of Electrical Engineering, College of Engineering, University of Ha’il, Ha’il 81441, Saudi Arabia)

  • Meshari Alazmi

    (Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha’il, Ha’il 81411, Saudi Arabia)

  • Masad A. Alrasheedi

    (Department of Management Information Systems, College of Business Administration, Taibah University, Madinah 42353, Saudi Arabia)

Abstract

The generalized penalized constrained regression (G-PCR) is a penalized model for high-dimensional linear inverse problems with structured features. This paper presents a sharp error performance analysis of the G-PCR in the over-parameterized high-dimensional setting. The analysis is carried out under the assumption of a noisy or erroneous Gaussian features matrix. To assess the performance of the G-PCR problem, the study employs multiple metrics such as prediction risk, cosine similarity, and the probabilities of misdetection and false alarm. These metrics offer valuable insights into the accuracy and reliability of the G-PCR model under different circumstances. Furthermore, the derived results are specialized and applied to well-known instances of G-PCR, including l 1 -norm penalized regression for sparse signal recovery and l 2 -norm (ridge) penalization. These specific instances are widely utilized in regression analysis for purposes such as feature selection and model regularization. To validate the obtained results, the paper provides numerical simulations conducted on both real-world and synthetic datasets. Using extensive simulations, we show the universality and robustness of the results of this work to the assumed Gaussian distribution of the features matrix. We empirically investigate the so-called double descent phenomenon and show how optimal selection of the hyper-parameters of the G-PCR can help mitigate this phenomenon. The derived expressions and insights from this study can be utilized to optimally select the hyper-parameters of the G-PCR. By leveraging these findings, one can make well-informed decisions regarding the configuration and fine-tuning of the G-PCR model, taking into consideration the specific problem at hand as well as the presence of noisy features in the high-dimensional setting.

Suggested Citation

  • Ayed M. Alrashdi & Meshari Alazmi & Masad A. Alrasheedi, 2023. "Generalized Penalized Constrained Regression: Sharp Guarantees in High Dimensions with Noisy Features," Mathematics, MDPI, vol. 11(17), pages 1-27, August.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:17:p:3706-:d:1227466
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/17/3706/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/17/3706/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    3. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    4. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    5. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    6. Saulius Jokubaitis & Remigijus Leipus, 2022. "Asymptotic Normality in Linear Regression with Approximately Sparse Structure," Mathematics, MDPI, vol. 10(10), pages 1-28, May.
    7. Alexandre Belloni & Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP77/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Domenico Giannone & Michele Lenza & Giorgio E. Primiceri, 2021. "Economic Predictions With Big Data: The Illusion of Sparsity," Econometrica, Econometric Society, vol. 89(5), pages 2409-2437, September.
    9. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers 49/16, Institute for Fiscal Studies.
    11. Aur'elien Ouattara & Matthieu Bult'e & Wan-Ju Lin & Philipp Scholl & Benedikt Veit & Christos Ziakas & Florian Felice & Julien Virlogeux & George Dikos, 2021. "Scalable Econometrics on Big Data -- The Logistic Regression on Spark," Papers 2106.10341, arXiv.org.
    12. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Estimation of treatment effects with high-dimensional controls," CeMMAP working papers CWP42/11, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Patrick Bajari & Denis Nekipelov & Stephen P. Ryan & Miaoyu Yang, 2015. "Demand Estimation with Machine Learning and Model Combination," NBER Working Papers 20955, National Bureau of Economic Research, Inc.
    14. Fan, Jianqing & Feng, Yang & Xia, Lucy, 2020. "A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models," Journal of Econometrics, Elsevier, vol. 218(1), pages 119-139.
    15. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach," Annual Review of Economics, Annual Reviews, vol. 7(1), pages 649-688, August.
    16. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    17. Alexandre Belloni & Victor Chernozhukov & Lie Wang, 2013. "Pivotal estimation via square-root lasso in nonparametric regression," CeMMAP working papers CWP62/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    18. Anindya Bhadra & Jyotishka Datta & Nicholas G. Polson & Brandon T. Willard, 2020. "Global-Local Mixtures: A Unifying Framework," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 426-447, August.
    19. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    20. Luke Mosley & Idris A. Eckley & Alex Gibberd, 2022. "Sparse temporal disaggregation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 2203-2233, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:17:p:3706-:d:1227466. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.