IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v82y2014i3p329-348.html
   My bibliography  Save this article

Fifty Years of Classification and Regression Trees

Author

Listed:
  • Wei-Yin Loh

Abstract

type="main" xml:id="insr12016-abs-0001"> Fifty years have passed since the publication of the first regression tree algorithm. New techniques have added capabilities that far surpass those of the early methods. Modern classification trees can partition the data with linear splits on subsets of variables and fit nearest neighbor, kernel density, and other models in the partitions. Regression trees can fit almost every kind of traditional statistical model, including least-squares, quantile, logistic, Poisson, and proportional hazards models, as well as models for longitudinal and multiresponse data. Greater availability and affordability of software (much of which is free) have played a significant role in helping the techniques gain acceptance and popularity in the broader scientific community. This article surveys the developments and briefly reviews the key ideas behind some of the major algorithms.

Suggested Citation

  • Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
  • Handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/insr.12016
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Ciampi, Antonio, 1991. "Generalized regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 12(1), pages 57-78, August.
    3. G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
    4. Shih, Yu-Shan & Tsai, Hsin-Wen, 2004. "Variable selection bias in regression trees with constant fits," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 595-607, April.
    5. Xiaogang Su & Juanjuan Fan, 2004. "Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty Models," Biometrics, The International Biometric Society, vol. 60(1), pages 93-99, March.
    6. Choi, Yunhee & Ahn, Hongshik & Chen, James J., 2005. "Regression trees for analysis of count data with extra Poisson variation," Computational Statistics & Data Analysis, Elsevier, vol. 49(3), pages 893-915, June.
    7. Shih, Y. -S., 2004. "A note on split selection bias in classification trees," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 457-466, April.
    8. David R. Larsen & Paul L. Speckman, 2004. "Multivariate Regression Trees for Analysis of Abundance Data," Biometrics, The International Biometric Society, vol. 60(2), pages 543-549, June.
    9. Elise Dusseldorp & Jacqueline Meulman, 2004. "The regression trunk approach to discover treatment covariate interaction," Psychometrika, Springer;The Psychometric Society, vol. 69(3), pages 355-374, September.
    10. Hsiao, Wei-Cheng & Shih, Yu-Shan, 2007. "Splitting variable selection for multivariate regression trees," Statistics & Probability Letters, Elsevier, vol. 77(3), pages 265-271, February.
    11. Gray, J. Brian & Fan, Guangzhe, 2008. "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1362-1372, January.
    12. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    13. Ahn, Hongshik, 1996. "Log-normal regression modeling through recursive partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 21(4), pages 381-398, April.
    14. Gao, Feng & Manatunga, Amita K. & Chen, Shande, 2004. "Identification of prognostic factors with multivariate survival data," Computational Statistics & Data Analysis, Elsevier, vol. 45(4), pages 813-824, May.
    15. Buttrey, Samuel E. & Karo, Ciril, 2002. "Using k-nearest-neighbor classification in the leaves of a tree," Computational Statistics & Data Analysis, Elsevier, vol. 40(1), pages 27-37, July.
    16. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    17. Lee, Paul H. & Yu, Philip L.H., 2010. "Distance-based tree models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1672-1682, June.
    18. Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
    19. Fan, Juanjuan & Su, Xiao-Gang & Levine, Richard A. & Nunn, Martha E. & LeBlanc, Michael, 2006. "Trees for Correlated Survival Data by Goodness of Split, With Applications to Tooth Prognosis," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 959-967, September.
    20. Taddy, Matthew A. & Gramacy, Robert B. & Polson, Nicholas G., 2011. "Dynamic Trees for Learning and Design," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 109-123.
    21. Keon Lee, Seong, 2005. "On generalized multivariate decision tree by using GEE," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1105-1119, June.
    22. Loh, Wei-Yin, 1991. "Survival modeling through recursive stratification," Computational Statistics & Data Analysis, Elsevier, vol. 12(3), pages 295-313, November.
    23. Harper, Paul R., 2005. "A review and comparison of classification algorithms for medical decision making," Health Policy, Elsevier, vol. 71(3), pages 315-331, March.
    24. Hemant Ishwaran & Eugene H. Blackstone & Claire E. Pothier & Michael S. Lauer, 2004. "Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 591-600, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Alessandra De Rose & Alessandro Pallara, 1997. "Survival Trees: An Alternative Non-Parametric Multivariate Technique for Life History Analysis," European Journal of Population, Springer;European Association for Population Studies, vol. 13(3), pages 223-241, September.
    3. Gerhard Tutz & Moritz Berger, 2016. "Item-focussed Trees for the Identification of Items in Differential Item Functioning," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 727-750, September.
    4. Fan, Juanjuan & Nunn, Martha E. & Su, Xiaogang, 2009. "Multivariate exponential survival trees and their application to tooth prognosis," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1110-1121, February.
    5. Yu-Shan Shih & Kuang-Hsun Liu, 2019. "Regression trees for detecting preference patterns from rank data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 683-702, September.
    6. Antonio D’Ambrosio & Willem J. Heiser, 2016. "A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 774-794, September.
    7. Schmid, Lena & Gerharz, Alexander & Groll, Andreas & Pauly, Markus, 2023. "Tree-based ensembles for multi-output regression: Comparing multivariate approaches with separate univariate ones," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    8. Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
    9. Keon Lee, Seong, 2005. "On generalized multivariate decision tree by using GEE," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1105-1119, June.
    10. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    11. Nan-Ting Liu & Feng-Chang Lin & Yu-Shan Shih, 2020. "Count regression trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 5-27, March.
    12. Shu-Fu Kuo & Yu-Shan Shih, 2012. "Variable selection for functional density trees," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1387-1395, December.
    13. Yifei Sun & Sy Han Chiou & Mei‐Cheng Wang, 2020. "ROC‐guided survival trees and ensembles," Biometrics, The International Biometric Society, vol. 76(4), pages 1177-1189, December.
    14. Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
    15. Karen Lostritto & Robert L. Strawderman & Annette M. Molinaro, 2012. "A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups," Biometrics, The International Biometric Society, vol. 68(4), pages 1146-1156, December.
    16. Hsiao, Wei-Cheng & Shih, Yu-Shan, 2007. "Splitting variable selection for multivariate regression trees," Statistics & Probability Letters, Elsevier, vol. 77(3), pages 265-271, February.
    17. Tomàs Aluja-Banet & Eduard Nafria, 2003. "Stability and scalability in decision trees," Computational Statistics, Springer, vol. 18(3), pages 505-520, September.
    18. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
    19. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    20. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/isiiinl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.