IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v31y2016i2d10.1007_s00180-015-0642-2.html
   My bibliography  Save this article

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Author

Listed:
  • Riccardo De Bin

    (University of Munich)

Abstract

Despite the limitations imposed by the proportional hazards assumption, the Cox model is probably the most popular statistical tool used to analyze survival data, thanks to its flexibility and ease of interpretation. For this reason, novel statistical/machine learning techniques are usually adapted to fit its requirements, including boosting. Boosting is an iterative technique originally developed in the machine learning community to handle classification problems, and later extended to the statistical field, where it is used in many situations, including regression and survival analysis. The popularity of boosting has been further driven by the availability of user-friendly software such as the R packages mboost and CoxBoost, both of which allow the implementation of boosting in conjunction with the Cox model. Despite the common underlying boosting principles, these two packages use different techniques: the former is an adaptation of model-based boosting, while the latter adapts likelihood-based boosting. Here we contrast these two boosting techniques as implemented in the R packages from an analytic point of view; we further examine solutions adopted within these packages to treat mandatory variables, i.e. variables that—for several reasons—must be included in the model. We explore the possibility of extending solutions currently only implemented in one package to the other. A simulation study and a real data example are added for illustration.

Suggested Citation

  • Riccardo De Bin, 2016. "Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost," Computational Statistics, Springer, vol. 31(2), pages 513-531, June.
  • Handle: RePEc:spr:compst:v:31:y:2016:i:2:d:10.1007_s00180-015-0642-2
    DOI: 10.1007/s00180-015-0642-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-015-0642-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-015-0642-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Benjamin Hofner & Andreas Mayr & Nikolay Robinzonov & Matthias Schmid, 2014. "Model-based boosting in R: a hands-on tutorial using the R package mboost," Computational Statistics, Springer, vol. 29(1), pages 3-35, February.
    2. Benjamin Hofner & Torsten Hothorn & Thomas Kneib, 2013. "Variable selection and model choice in structured survival models," Computational Statistics, Springer, vol. 28(3), pages 1079-1101, June.
    3. Gerhard Tutz & Harald Binder, 2006. "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting," Biometrics, The International Biometric Society, vol. 62(4), pages 961-971, December.
    4. Tutz, Gerhard & Binder, Harald, 2007. "Boosting ridge regression," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6044-6059, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    2. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    3. Heidi Seibold & Christoph Bernau & Anne-Laure Boulesteix & Riccardo De Bin, 2018. "On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models," Computational Statistics, Springer, vol. 33(3), pages 1195-1215, September.
    4. Yanis Tazi & Juan E. Arango-Ossa & Yangyu Zhou & Elsa Bernard & Ian Thomas & Amanda Gilkes & Sylvie Freeman & Yoann Pradat & Sean J. Johnson & Robert Hills & Richard Dillon & Max F. Levine & Daniel Le, 2022. "Unified classification and risk-stratification in Acute Myeloid Leukemia," Nature Communications, Nature, vol. 13(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    2. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    3. Faisal Zahid & Gerhard Tutz, 2013. "Multinomial logit models with implicit variable selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(4), pages 393-416, December.
    4. Sariyar Murat & Schumacher Martin & Binder Harald, 2014. "A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 343-357, June.
    5. Hainaut, Donatien & Trufin, Julien & Denuit, Michel, 2021. "Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link," LIDAM Discussion Papers ISBA 2021012, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    6. Heidi Seibold & Christoph Bernau & Anne-Laure Boulesteix & Riccardo De Bin, 2018. "On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models," Computational Statistics, Springer, vol. 33(3), pages 1195-1215, September.
    7. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    8. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    9. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    10. Philip Kostov, 2010. "Do Buyers’ Characteristics and Personal Relationships Affect Agricultural Land Prices?," Land Economics, University of Wisconsin Press, vol. 86(1), pages 48-65.
    11. Juan Torres Munguía, 2024. "Identifying Gender-Specific Risk Factors for Income Poverty across Poverty Levels in Urban Mexico: A Model-Based Boosting Approach," Social Sciences, MDPI, vol. 13(3), pages 1-21, March.
    12. Yousuf, Kashif & Ng, Serena, 2021. "Boosting high dimensional predictive regressions with time varying parameters," Journal of Econometrics, Elsevier, vol. 224(1), pages 60-87.
    13. Philipp F. M. Baumann & Enzo Rossi & Alexander Volkmann, 2020. "What Drives Inflation and How: Evidence from Additive Mixed Models Selected by cAIC," Papers 2006.06274, arXiv.org, revised Aug 2022.
    14. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    15. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    16. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    17. Heikki Kauppi, 2019. "Recession Prediction with OptimalUse of Leading Indicators," Discussion Papers 125, Aboa Centre for Economics.
    18. Ngandu Balekelayi & Solomon Tesfamariam, 2020. "Geoadditive Quantile Regression Model for Sewer Pipes Deterioration Using Boosting Optimization Algorithm," Sustainability, MDPI, vol. 12(20), pages 1-24, October.
    19. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    20. Lahiri, Kajal & Yang, Cheng, 2022. "Boosting tax revenues with mixed-frequency data in the aftermath of COVID-19: The case of New York," International Journal of Forecasting, Elsevier, vol. 38(2), pages 545-566.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:31:y:2016:i:2:d:10.1007_s00180-015-0642-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.