IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v62y2021i6d10.1007_s00362-020-01210-3.html
   My bibliography  Save this article

Model-free feature screening via distance correlation for ultrahigh dimensional survival data

Author

Listed:
  • Jing Zhang

    (Zhongnan University of Economics and Law)

  • Yanyan Liu

    (Wuhan University)

  • Hengjian Cui

    (Capital Normal University)

Abstract

With the explosion of ultrahigh dimensional data in various fields, many sure independent screening methods have been proposed to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan–Meier estimator to handle censoring, which may not perform well for heavy censoring cases. In this article, we propose a novel sure independent screening procedure based on distance correlation after standardizing marginal variables for ultrahigh dimensional survival data. It is a model-free approach and does not involve the Kaplan–Meier estimator, thus its performance is much more robust than the existing methods. Furthermore, our proposed method enjoys other advantages: it avoids the complication to specify an actual model from large number of covariates; it enjoys the sure screening property and the ranking consistency under some mild regularity conditions; it does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Extensive numerical studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to a gene expression data set.

Suggested Citation

  • Jing Zhang & Yanyan Liu & Hengjian Cui, 2021. "Model-free feature screening via distance correlation for ultrahigh dimensional survival data," Statistical Papers, Springer, vol. 62(6), pages 2711-2738, December.
  • Handle: RePEc:spr:stpapr:v:62:y:2021:i:6:d:10.1007_s00362-020-01210-3
    DOI: 10.1007/s00362-020-01210-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-020-01210-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-020-01210-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yi Liu & Qihua Wang, 2018. "Model-free feature screening for ultrahigh-dimensional data conditional on some variables," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(2), pages 283-301, April.
    2. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    3. Jialiang Li & Qi Zheng & Limin Peng & Zhipeng Huang, 2016. "Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes," Biometrics, The International Biometric Society, vol. 72(4), pages 1145-1154, December.
    4. Hengjian Cui & Runze Li & Wei Zhong, 2015. "Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 630-641, June.
    5. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    6. Jing Zhang & Guosheng Yin & Yanyan Liu & Yuanshan Wu, 2018. "Censored cumulative residual independent screening for ultrahigh-dimensional survival data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(2), pages 273-292, April.
    7. Wenliang Pan & Xueqin Wang & Weinan Xiao & Hongtu Zhu, 2019. "A Generic Sure Independence Screening Procedure," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 928-937, April.
    8. Jun Lu & Lu Lin, 2020. "Model-free conditional screening via conditional distance correlation," Statistical Papers, Springer, vol. 61(1), pages 225-244, February.
    9. Yuanshan Wu & Guosheng Yin, 2015. "Conditional quantile screening in ultrahigh-dimensional heterogeneous data," Biometrika, Biometrika Trust, vol. 102(1), pages 65-76.
    10. Zhao, Sihai Dave & Li, Yi, 2012. "Principled sure independence screening for Cox models with ultra-high-dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 397-411.
    11. Rui Song & Wenbin Lu & Shuangge Ma & X. Jessie Jeng, 2014. "Censored rank independence screening for high-dimensional survival data," Biometrika, Biometrika Trust, vol. 101(4), pages 799-814.
    12. Fan, Jianqing & Feng, Yang & Song, Rui, 2011. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 544-557.
    13. Jianqing Fan & Yunbei Ma & Wei Dai, 2014. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1270-1284, September.
    14. Anders Gorst-Rasmussen & Thomas Scheike, 2013. "Independent screening for single-index hazard rate models with ultrahigh dimensional features," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(2), pages 217-246, March.
    15. Xia, Xiaochao & Yang, Hu & Li, Jialiang, 2016. "Feature screening for generalized varying coefficient models with application to dichotomous responses," Computational Statistics & Data Analysis, Elsevier, vol. 102(C), pages 85-97.
    16. Qinqin Hu & Lu Lin, 2017. "Conditional sure independence screening by conditional marginal empirical likelihood," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 63-96, February.
    17. Xiaolin Chen & Xiaojing Chen & Yi Liu, 2019. "A note on quantile feature screening via distance correlation," Statistical Papers, Springer, vol. 60(5), pages 1741-1762, October.
    18. Tibshirani Robert J., 2009. "Univariate Shrinkage in the Cox Model for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, April.
    19. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang Qingyang, 2023. "A nonparametric test for comparing survival functions based on restricted distance correlation," Dependence Modeling, De Gruyter, vol. 11(1), pages 1-15.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    2. Zhong, Wei & Wang, Jiping & Chen, Xiaolin, 2021. "Censored mean variance sure independence screening for ultrahigh dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    3. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Feature screening for case‐cohort studies with failure time outcome," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 349-370, March.
    4. Pan, Yingli, 2022. "Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    5. Jing Zhang & Guosheng Yin & Yanyan Liu & Yuanshan Wu, 2018. "Censored cumulative residual independent screening for ultrahigh-dimensional survival data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(2), pages 273-292, April.
    6. Guo, Chaohui & Lv, Jing & Wu, Jibo, 2021. "Composite quantile regression for ultra-high dimensional semiparametric model averaging," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    7. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    8. Liu, Yanyan & Zhang, Jing & Zhao, Xingqiu, 2018. "A new nonparametric screening method for ultrahigh-dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 74-85.
    9. Qu, Lianqiang & Wang, Xiaoyu & Sun, Liuquan, 2022. "Variable screening for varying coefficient models with ultrahigh-dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    10. Hyokyoung G. Hong & Xuerong Chen & David C. Christiani & Yi Li, 2018. "Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes," Biometrics, The International Biometric Society, vol. 74(2), pages 421-429, June.
    11. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    12. Chen, Xiaolin & Chen, Xiaojing & Wang, Hong, 2018. "Robust feature screening for ultra-high dimensional right censored data via distance correlation," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 118-138.
    13. Zhang, Jing & Liu, Yanyan & Wu, Yuanshan, 2017. "Correlation rank screening for ultrahigh-dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 121-132.
    14. Zhang, Shen & Zhao, Peixin & Li, Gaorong & Xu, Wangli, 2019. "Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 37-52.
    15. Jing Pan & Yuan Yu & Yong Zhou, 2018. "Nonparametric independence feature screening for ultrahigh-dimensional survival data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 81(7), pages 821-847, October.
    16. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    17. He, Yong & Zhang, Liang & Ji, Jiadong & Zhang, Xinsheng, 2019. "Robust feature screening for elliptical copula regression model," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 568-582.
    18. Yang, Guangren & Zhang, Ling & Li, Runze & Huang, Yuan, 2019. "Feature screening in ultrahigh-dimensional varying-coefficient Cox model," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 284-297.
    19. Xiaolin Chen & Xiaojing Chen & Yi Liu, 2019. "A note on quantile feature screening via distance correlation," Statistical Papers, Springer, vol. 60(5), pages 1741-1762, October.
    20. Yang, Baoying & Yin, Xiangrong & Zhang, Nan, 2019. "Sufficient variable selection using independence measures for continuous response," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 480-493.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:62:y:2021:i:6:d:10.1007_s00362-020-01210-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.