IDEAS home Printed from https://ideas.repec.org/a/spr/aistmt/v73y2021i4d10.1007_s10463-020-00764-1.html
   My bibliography  Save this article

Robust high-dimensional regression for data with anomalous responses

Author

Listed:
  • Mingyang Ren

    (University of Chinese Academy of Sciences
    Chinese Academy of Sciences)

  • Sanguo Zhang

    (University of Chinese Academy of Sciences
    Chinese Academy of Sciences)

  • Qingzhao Zhang

    (Xiamen University)

Abstract

The accuracy of response variables is crucially important to train regression models. In some situations, including the high-dimensional case, response observations tend to be inaccurate, which would lead to biased estimators by directly fitting a conventional model. For analyzing data with anomalous responses in the high-dimensional case, in this work, we adopt γ-divergence to conduct variable selection and estimation methods. The proposed method possesses good robustness to anomalous responses, and the proportion of abnormal data does not need to be modeled. It is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we rigorously establish variable selection consistency and estimation bounds. Numerical simulations and an application on real data are presented to demonstrate the performance of the proposed method.

Suggested Citation

  • Mingyang Ren & Sanguo Zhang & Qingzhao Zhang, 2021. "Robust high-dimensional regression for data with anomalous responses," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(4), pages 703-736, August.
  • Handle: RePEc:spr:aistmt:v:73:y:2021:i:4:d:10.1007_s10463-020-00764-1
    DOI: 10.1007/s10463-020-00764-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10463-020-00764-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10463-020-00764-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273.
    2. She, Yiyuan & Owen, Art B., 2011. "Outlier Detection Using Nonconvex Penalized Regression," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 626-639.
    3. Fujisawa, Hironori & Eguchi, Shinto, 2008. "Robust parameter estimation with a small bias against heavy contamination," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 2053-2081, October.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Hung Hung & Zhi†Yu Jou & Su†Yun Huang, 2018. "Robust mislabel logistic regression without modeling mislabel probabilities," Biometrics, The International Biometric Society, vol. 74(1), pages 145-154, March.
    6. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    7. Kenichi Hayashi, 2012. "A boosting method with asymmetric mislabeling probabilities which depend on covariates," Computational Statistics, Springer, vol. 27(2), pages 203-218, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Okhrin, Ostap & Ristig, Alexander & Sheen, Jeffrey R. & Trück, Stefan, 2015. "Conditional systemic risk with penalized copula," SFB 649 Discussion Papers 2015-038, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    2. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.
    3. Shuang Zhang & Xingdong Feng, 2022. "Distributed identification of heterogeneous treatment effects," Computational Statistics, Springer, vol. 37(1), pages 57-89, March.
    4. Jun Zhu & Hsin‐Cheng Huang & Perla E. Reyes, 2010. "On selection of spatial linear models for lattice data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 389-402, June.
    5. Ye, Mao & Lu, Zhao-Hua & Li, Yimei & Song, Xinyuan, 2019. "Finite mixture of varying coefficient model: Estimation and component selection," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 452-474.
    6. Tang, Linjun & Zhou, Zhangong & Wu, Changchun, 2012. "Weighted composite quantile estimation and variable selection method for censored regression model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 653-663.
    7. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    8. Cai, Tingting & Li, Jianbo & Zhou, Qin & Yin, Songlou & Zhang, Riquan, 2024. "Subgroup detection based on partially linear additive individualized model with missing data in response," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    9. Xia Chen & Liyue Mao, 2020. "Penalized empirical likelihood for partially linear errors-in-variables models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 597-623, December.
    10. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    11. Fan, Guo-Liang & Liang, Han-Ying & Shen, Yu, 2016. "Penalized empirical likelihood for high-dimensional partially linear varying coefficient model with measurement errors," Journal of Multivariate Analysis, Elsevier, vol. 147(C), pages 183-201.
    12. Xiao Ni & Daowen Zhang & Hao Helen Zhang, 2010. "Variable Selection for Semiparametric Mixed Models in Longitudinal Studies," Biometrics, The International Biometric Society, vol. 66(1), pages 79-88, March.
    13. Tizheng Li & Xiaojuan Kang, 2022. "Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters," Statistical Papers, Springer, vol. 63(1), pages 243-285, February.
    14. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    15. Feng, Sanying & Lian, Heng & Xue, Liugen, 2016. "A new nested Cholesky decomposition and estimation for the covariance matrix of bivariate longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 102(C), pages 98-109.
    16. Yunxiao Chen & Xiaoou Li & Jingchen Liu & Zhiliang Ying, 2017. "Regularized Latent Class Analysis with Application in Cognitive Diagnosis," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 660-692, September.
    17. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    18. Xinyang Wang & Dehui Wang & Kai Yang, 2021. "Integer-valued time series model order shrinkage and selection via penalized quasi-likelihood approach," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(5), pages 713-750, July.
    19. Zhixuan Fu & Shuangge Ma & Haiqun Lin & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized Variable Selection for Multi-center Competing Risks Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 379-405, December.
    20. Feng, Zhenghui & Zhu, Lixing, 2012. "An alternating determination–optimization approach for an additive multi-index model," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1981-1993.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aistmt:v:73:y:2021:i:4:d:10.1007_s10463-020-00764-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.