IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v35y2023i4p797-816.html
   My bibliography  Save this article

Convex and Nonconvex Risk-Based Linear Regression at Scale

Author

Listed:
  • Can Wu

    (School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China; Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong)

  • Ying Cui

    (Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455)

  • Donghui Li

    (School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China)

  • Defeng Sun

    (Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong)

Abstract

The value at risk (VaR) and the conditional value at risk (CVaR) are two popular risk measures to hedge against the uncertainty of data. In this paper, we provide a computational toolbox for solving high-dimensional sparse linear regression problems under either VaR or CVaR measures, the former being nonconvex and the latter convex. Unlike the empirical risk (neutral) minimization models in which the overall losses are decomposable across data, the aforementioned risk-sensitive models have nonseparable objective functions so that typical first order algorithms are not easy to scale. We address this scaling issue by adopting a semismooth Newton-based proximal augmented Lagrangian method of the convex CVaR linear regression problem. The matrix structures of the Newton systems are carefully explored to reduce the computational cost per iteration. The method is further embedded in a majorization–minimization algorithm as a subroutine to tackle the nonconvex VaR-based regression problem. We also discuss an adaptive sieving strategy to iteratively guess and adjust the effective problem dimension, which is particularly useful when a solution path associated with a sequence of tuning parameters is needed. Extensive numerical experiments on both synthetic and real data demonstrate the effectiveness of our proposed methods. In particular, they are about 53 times faster than the commercial package Gurobi for the CVaR-based sparse linear regression with 4,265,669 features and 16,087 observations.

Suggested Citation

  • Can Wu & Ying Cui & Donghui Li & Defeng Sun, 2023. "Convex and Nonconvex Risk-Based Linear Regression at Scale," INFORMS Journal on Computing, INFORMS, vol. 35(4), pages 797-816, July.
  • Handle: RePEc:inm:orijoc:v:35:y:2023:i:4:p:797-816
    DOI: 10.1287/ijoc.2023.1282
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2023.1282
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2023.1282?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. R. T. Rockafellar, 1976. "Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming," Mathematics of Operations Research, INFORMS, vol. 1(2), pages 97-116, May.
    2. Wang, Hansheng & Li, Guodong & Jiang, Guohua, 2007. "Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 347-355, July.
    3. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    4. Robert Tibshirani & Jacob Bien & Jerome Friedman & Trevor Hastie & Noah Simon & Jonathan Taylor & Ryan J. Tibshirani, 2012. "Strong rules for discarding predictors in lasso‐type problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(2), pages 245-266, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Huangyue & Kong, Lingchen & Shang, Pan & Pan, Shanshan, 2020. "Safe feature screening rules for the regularized Huber regression," Applied Mathematics and Computation, Elsevier, vol. 386(C).
    2. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    3. Hu Yang & Ning Li & Jing Yang, 2020. "A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates," Statistical Papers, Springer, vol. 61(5), pages 1911-1937, October.
    4. Junlong Zhao & Chao Liu & Lu Niu & Chenlei Leng, 2019. "Multiple influential point detection in high dimensional regression spaces," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 385-408, April.
    5. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    6. Kean Ming Tan & Lan Wang & Wen‐Xin Zhou, 2022. "High‐dimensional quantile regression: Convolution smoothing and concave regularization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 205-233, February.
    7. Weiyan Mu & Shifeng Xiong, 2014. "Some notes on robust sure independence screening," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(10), pages 2092-2102, October.
    8. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    9. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    10. N. Neykov & P. Filzmoser & P. Neytchev, 2014. "Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator," Statistical Papers, Springer, vol. 55(1), pages 187-207, February.
    11. Qiang Li & Liming Wang, 2020. "Robust change point detection method via adaptive LAD-LASSO," Statistical Papers, Springer, vol. 61(1), pages 109-121, February.
    12. Muhammad Amin & Lixin Song & Milton Abdul Thorlie & Xiaoguang Wang, 2015. "SCAD-penalized quantile regression for high-dimensional data analysis and variable selection," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 69(3), pages 212-235, August.
    13. Yi Chu & Lu Lin, 2020. "Conditional SIRS for nonparametric and semiparametric models by marginal empirical likelihood," Statistical Papers, Springer, vol. 61(4), pages 1589-1606, August.
    14. Guo, Yi & Berman, Mark & Gao, Junbin, 2014. "Group subset selection for linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 39-52.
    15. Nahapetyan Yervand, 2019. "The benefits of the Velvet Revolution in Armenia: Estimation of the short-term economic gains using deep neural networks," Central European Economic Journal, Sciendo, vol. 6(53), pages 286-303, January.
    16. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    17. Yang, Xuzhi & Wang, Tengyao, 2024. "Multiple-output composite quantile regression through an optimal transport lens," LSE Research Online Documents on Economics 125589, London School of Economics and Political Science, LSE Library.
    18. Jean-Pierre Crouzeix & Abdelhak Hassouni & Eladio Ocaña, 2023. "A Short Note on the Twice Differentiability of the Marginal Function of a Convex Function," Journal of Optimization Theory and Applications, Springer, vol. 198(2), pages 857-867, August.
    19. Sauvenier, Mathieu & Van Bellegem, Sébastien, 2023. "Direction Identification and Minimax Estimation by Generalized Eigenvalue Problem in High Dimensional Sparse Regression," LIDAM Discussion Papers CORE 2023005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    20. Zhu Wang, 2022. "MM for penalized estimation," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 54-75, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:35:y:2023:i:4:p:797-816. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.