IDEAS home Printed from https://ideas.repec.org/p/tse/wpaper/28126.html
   My bibliography  Save this paper

Stable variable selection for right censored data: comparison of methods

Author

Listed:
  • Besse, Philippe
  • Leconte, Eve
  • Walschaerts, Marie

Abstract

The instability in the selection of models is a major concern with data sets containing a large number of covariates. This paper deals with variable selection methodology in the case of high-dimensional problems where the response variable can be right censored. We focuse on new stable variable selection methods based on bootstrap for two different methodologies commonly used in survival analysis: the Cox proportional hazard model and survival trees. As far as the Cox model is concerned, we investigate the bootstrapping applied to two variable selection techniques: the stepwise algorithm based on the AIC criterion and the L1-penalization of Lasso. Regarding survival trees, we review two methodologies: the bootstrap node-level stabilization and random survival forests. We apply these different approaches to two real data sets, a classical breast cancer data set and an original infertility data set. We compare the methods on two criteria: the prediction error rate based on the Harrell concordance index and the relevance of the interpretation of the corresponding selected models, focusing on the original infertility data set. The aim is to find a compromise between a good prediction performance and ease to interpretation for clinicians. Results suggest that in the case of a small number of individuals, a bootstrapping adapted to L1-penalization in the Cox model or a bootstrap node-level stabilization in survival trees give a good alternative to the random survival forest methodology, known to give the smallest prediction error rate but difficult to interprete by non-statisticians. In a clinical perspective, the complementarity between the methods based on the Cox model and those based on survival trees would permit to built reliable models easy to interprete by the clinician.

Suggested Citation

  • Besse, Philippe & Leconte, Eve & Walschaerts, Marie, 2012. "Stable variable selection for right censored data: comparison of methods," TSE Working Papers 12-486, Toulouse School of Economics (TSE).
  • Handle: RePEc:tse:wpaper:28126
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1203.4928v1.pdf
    File Function: Full text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    2. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mozhgan Safe & Hossein Mahjub & Javad Faradmal, 2017. "A Comparative Study for Modelling the Survival of Breast Cancer Patients in the West of Iran," Global Journal of Health Science, Canadian Center of Science and Education, vol. 9(2), pages 215-215, February.
    2. Khan Md Hasinur Rahaman & Bhadra Anamika & Howlader Tamanna, 2019. "Stability selection for lasso, ridge and elastic net implemented with AFT models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(5), pages 1-14, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    2. Gautier Marti & Frank Nielsen & Philippe Donnat & S'ebastien Andler, 2016. "On clustering financial time series: a need for distances between dependent random variables," Papers 1603.07822, arXiv.org.
    3. Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
    4. Yu, Dengdeng & Zhang, Li & Mizera, Ivan & Jiang, Bei & Kong, Linglong, 2019. "Sparse wavelet estimation in quantile regression with multiple functional predictors," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 12-29.
    5. Sohrabi, Narges & Movaghari, Hadi, 2020. "Reliable factors of Capital structure: Stability selection approach," The Quarterly Review of Economics and Finance, Elsevier, vol. 77(C), pages 296-310.
    6. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    7. Chun, Hyonho & Lee, Myung Hee & Fleet, James C. & Oh, Ji Hwan, 2016. "Graphical models via joint quantile regression with component selection," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 162-171.
    8. Guo, Peiyang & Lam, Jacqueline C.K. & Li, Victor O.K., 2019. "Drivers of domestic electricity users’ price responsiveness: A novel machine learning approach," Applied Energy, Elsevier, vol. 235(C), pages 900-913.
    9. Liang, Lixing & Zhuang, Yipeng & Yu, Philip L.H., 2024. "Variable selection for high-dimensional incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    10. Solari, Aldo & Djordjilović, Vera, 2022. "Multi split conformal prediction," Statistics & Probability Letters, Elsevier, vol. 184(C).
    11. Raheem, S.M. Enayetur & Ahmed, S. Ejaz & Doksum, Kjell A., 2012. "Absolute penalty and shrinkage estimation in partially linear models," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 874-891.
    12. Nazemi, Abdolreza & Fabozzi, Frank J., 2018. "Macroeconomic variable selection for creditor recovery rates," Journal of Banking & Finance, Elsevier, vol. 89(C), pages 14-25.
    13. Hua Jin & Ying Lu & Kaite Stone & Dennis M. Black, 2004. "Alternative Tree-Structured Survival Analysis Based on Variance of Survival Time," Medical Decision Making, , vol. 24(6), pages 670-680, November.
    14. Zhang, Heping, 2004. "Recursive Partitioning and Tree-based Methods," Papers 2004,30, Humboldt University of Berlin, Center for Applied Statistics and Economics (CASE).
    15. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    16. Latouche, Pierre & Mattei, Pierre-Alexandre & Bouveyron, Charles & Chiquet, Julien, 2016. "Combining a relaxed EM algorithm with Occam’s razor for Bayesian variable selection in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 177-190.
    17. Tan, Kean Ming & Witten, Daniela & Shojaie, Ali, 2015. "The cluster graphical lasso for improved estimation of Gaussian graphical models," Computational Statistics & Data Analysis, Elsevier, vol. 85(C), pages 23-36.
    18. A. S. Foulkes & V. De Gruttola, 2002. "Characterizing the Relationship Between HIV-1 Genotype and Phenotype: Prediction-Based Classification," Biometrics, The International Biometric Society, vol. 58(1), pages 145-156, March.
    19. Yang, Yihe & Zhou, Jie & Pan, Jianxin, 2021. "Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    20. Du, Lilun & Lan, Wei & Luo, Ronghua & Zhong, Pingshou, 2018. "Factor-adjusted multiple testing of correlations," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 34-47.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tse:wpaper:28126. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/tsetofr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.