Tree-based multivariate regression and density estimation with right-censored data

My bibliography Save this article

Tree-based multivariate regression and density estimation with right-censored data

Author

Listed:

Molinaro, Annette M.
Dudoit, Sandrine
van der Laan, M.J.Mark J.

Registered:

Abstract

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) First, define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator of this risk. (2) Next, construct candidate estimators based on the loss function for the observed data. (3) Then, apply cross-validation to estimate risk based on the observed data loss function and to select an optimal estimator among the candidates. A number of common estimation procedures follow this approach in the full data situation, but depart from it when faced with the obstacle of evaluating the loss function for censored observations. Here, we argue that one can, and should, also adhere to this estimation road map in censored data situations. Tree-based methods, where the candidate estimators in Step 2 are generated by recursive binary partitioning of a suitably defined covariate space, provide a striking example of the chasm between estimation procedures for full data and censored data (e.g., regression trees as in CART for uncensored data and adaptations to censored data). Common approaches for regression trees bypass the risk estimation problem for censored outcomes by altering the node splitting and tree pruning criteria in manners that are specific to right-censored data. This article describes an application of our unified methodology to tree-based estimation with censored data. The approach encompasses univariate outcome prediction, multivariate outcome prediction, and density estimation, simply by defining a suitable loss function for each of these problems. The proposed method for tree-based estimation with censoring is evaluated using a simulation study and the analysis of CGH copy number and survival data from breast cancer patients.

Suggested Citation

Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.

Handle: RePEc:eee:jmvana:v:90:y:2004:i:1:p:154-177

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

van der Laan Mark J. & Dudoit Sandrine & Keles Sunduz, 2004. "Asymptotic Optimality of Likelihood-Based Cross-Validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-25, March.
Sandra Sinisi & Mark van der Laan, 2004. "Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms in Estimation," U.C. Berkeley Division of Biostatistics Working Paper Series 1142, Berkeley Electronic Press.
Keles Sunduz & van der Laan Mark J. & Dudoit Sandrine & Xing Biao & Eisen Michael B., 2003. "Supervised Detection of Regulatory Motifs in DNA Sequences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 2(1), pages 1-40, August.
Sandrine Dudoit & Mark van der Laan & Sunduz Keles & Annette Molinaro & Sandra Sinisi & Siew Leng Teng, 2004. "Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding," U.C. Berkeley Division of Biostatistics Working Paper Series 1136, Berkeley Electronic Press.
Leo Breiman & Jerome H. Friedman, 1997. "Predicting Multivariate Responses in Multiple Linear Regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(1), pages 3-54.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Laan Mark J. van der & Dudoit Sandrine & Vaart Aad W. van der, 2006. "The cross-validated adaptive epsilon-net estimator," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 373-395, December.
Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.
Yifei Sun & Sy Han Chiou & Mei‐Cheng Wang, 2020. "ROC‐guided survival trees and ensembles," Biometrics, The International Biometric Society, vol. 76(4), pages 1177-1189, December.
Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
- Athey, Susan & Tibshirani, Julie & Wager, Stefan, 2017. "Generalized Random Forests," Research Papers 3575, Stanford University, Graduate School of Business.
Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
- Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01364437, HAL.
Alina Schenk & Moritz Berger & Matthias Schmid, 2024. "Pseudo-value regression trees," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 30(2), pages 439-471, April.
Mark van der Laan & Sandrine Dudoit & Aad van der Vaart, 2004. "The Cross-Validated Adaptive Epsilon-Net Estimator," U.C. Berkeley Division of Biostatistics Working Paper Series 1141, Berkeley Electronic Press.
Pablo Gonzalez Ginestet & Ales Kotalik & David M. Vock & Julian Wolfson & Erin E. Gabriel, 2021. "Stacked inverse probability of censoring weighted bagging: A case study in the InfCareHIV Register," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 51-65, January.
Sinisi Sandra E. & Neugebauer Romain & van der Laan Mark J., 2006. "Cross-Validated Bagged Prediction of Survival," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-26, May.
Karen Lostritto & Robert L. Strawderman & Annette M. Molinaro, 2012. "A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups," Biometrics, The International Biometric Society, vol. 68(4), pages 1146-1156, December.
Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
Alexander Hanbo Li & Jelena Bradic, 2019. "Censored Quantile Regression Forests," Papers 1902.03327, arXiv.org.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Paul Hewson & Keming Yu, 2008. "Quantile regression for binary performance indicators," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 24(5), pages 401-418, September.
Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
Jewson Stephen & Penzer Jeremy, 2006. "Estimating Trends in Weather Series: Consequences for Pricing Derivatives," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 10(3), pages 1-17, September.
Luebke, Karsten & Czogiel, Irina & Weihs, Claus, 2004. "Latent Factor Prediction Pursuit for Rank Deficient Regressors," Technical Reports 2004,75, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
Bruce Desmarais, 2012. "Lessons in disguise: multivariate predictive mistakes in collective choice models," Public Choice, Springer, vol. 151(3), pages 719-737, June.
Sandrine Dudoit & Mark van der Laan & Sunduz Keles & Annette Molinaro & Sandra Sinisi & Siew Leng Teng, 2004. "Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding," U.C. Berkeley Division of Biostatistics Working Paper Series 1136, Berkeley Electronic Press.
Stitelman Ori M & van der Laan Mark J., 2010. "Collaborative Targeted Maximum Likelihood for Time to Event Data," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-46, June.
Wang, Yihe & Zhao, Sihai Dave, 2021. "A nonparametric empirical Bayes approach to large-scale multivariate regression," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
Adel Javanmard & Jingwei Ji & Renyuan Xu, 2024. "Multi-Task Dynamic Pricing in Credit Market with Contextual Information," Papers 2410.14839, arXiv.org, revised Oct 2024.
Seokhyun Chung & Raed Al Kontar & Zhenke Wu, 2022. "Weakly Supervised Multi-output Regression via Correlated Gaussian Processes," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 115-137, October.
Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
- Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01364437, HAL.
Arafat Tayeb & Aurélie Labbe & Alexandre Bureau & Chantal Mérette, 2011. "Solving genetic heterogeneity in extended families by identifying sub-types of complex diseases," Computational Statistics, Springer, vol. 26(3), pages 539-560, September.
Qiang Sun & Hongtu Zhu & Yufeng Liu & Joseph G. Ibrahim, 2015. "SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 289-302, March.
Joyce de Souza Zanirato Maia & Ana Paula Arantes Bueno & João Ricardo Sato, 2021. "Assessing the educational performance of different Brazilian school cycles using data science methods," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-14, March.
Laan Mark J. van der & Dudoit Sandrine & Vaart Aad W. van der, 2006. "The cross-validated adaptive epsilon-net estimator," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 373-395, December.
Hu, Yingyao & Schennach, Susanne & Shiu, Ji-Liang, 2022. "Identification of nonparametric monotonic regression models with continuous nonclassical measurement errors," Journal of Econometrics, Elsevier, vol. 226(2), pages 269-294.
Jhun, Myoungshic & Choi, Inkyung, 2009. "Bootstrapping least distance estimator in the multivariate regression model," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4221-4227, October.
Mahmood Zafar & Khan Salahuddin, 2009. "On the Use of K-Fold Cross-Validation to Choose Cutoff Values and Assess the Performance of Predictive Models in Stepwise Regression," The International Journal of Biostatistics, De Gruyter, vol. 5(1), pages 1-21, July.
Haight, Thaddeus J. & Wang, Yue & van der Laan, Mark J. & Tager, Ira B., 2010. "A cross-validation deletion-substitution-addition model selection algorithm: Application to marginal structural models," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3080-3094, December.
Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.

More about this item

Keywords

CART Censored data Comparative genomic hybridization Cross-validation Density estimation Loss function Microarray Model selection Multivariate outcome Prediction Regression tree Risk estimation Survival analysis;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:90:y:2004:i:1:p:154-177. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Tree-based multivariate regression and density estimation with right-censored data

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data