IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v82y2014i3p329-348.html
   My bibliography  Save this article

Fifty Years of Classification and Regression Trees

Author

Listed:
  • Wei-Yin Loh

Abstract

type="main" xml:id="insr12016-abs-0001"> Fifty years have passed since the publication of the first regression tree algorithm. New techniques have added capabilities that far surpass those of the early methods. Modern classification trees can partition the data with linear splits on subsets of variables and fit nearest neighbor, kernel density, and other models in the partitions. Regression trees can fit almost every kind of traditional statistical model, including least-squares, quantile, logistic, Poisson, and proportional hazards models, as well as models for longitudinal and multiresponse data. Greater availability and affordability of software (much of which is free) have played a significant role in helping the techniques gain acceptance and popularity in the broader scientific community. This article surveys the developments and briefly reviews the key ideas behind some of the major algorithms.

Suggested Citation

  • Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
  • Handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/insr.12016
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lee, Paul H. & Yu, Philip L.H., 2010. "Distance-based tree models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1672-1682, June.
    2. Ahn, Hongshik, 1996. "Log-normal regression modeling through recursive partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 21(4), pages 381-398, April.
    3. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    4. Ciampi, Antonio, 1991. "Generalized regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 12(1), pages 57-78, August.
    5. David R. Larsen & Paul L. Speckman, 2004. "Multivariate Regression Trees for Analysis of Abundance Data," Biometrics, The International Biometric Society, vol. 60(2), pages 543-549, June.
    6. Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
    7. Gao, Feng & Manatunga, Amita K. & Chen, Shande, 2004. "Identification of prognostic factors with multivariate survival data," Computational Statistics & Data Analysis, Elsevier, vol. 45(4), pages 813-824, May.
    8. G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
    9. Buttrey, Samuel E. & Karo, Ciril, 2002. "Using k-nearest-neighbor classification in the leaves of a tree," Computational Statistics & Data Analysis, Elsevier, vol. 40(1), pages 27-37, July.
    10. Elise Dusseldorp & Jacqueline Meulman, 2004. "The regression trunk approach to discover treatment covariate interaction," Psychometrika, Springer;The Psychometric Society, vol. 69(3), pages 355-374, September.
    11. Hsiao, Wei-Cheng & Shih, Yu-Shan, 2007. "Splitting variable selection for multivariate regression trees," Statistics & Probability Letters, Elsevier, vol. 77(3), pages 265-271, February.
    12. Shih, Yu-Shan & Tsai, Hsin-Wen, 2004. "Variable selection bias in regression trees with constant fits," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 595-607, April.
    13. Keon Lee, Seong, 2005. "On generalized multivariate decision tree by using GEE," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1105-1119, June.
    14. Gray, J. Brian & Fan, Guangzhe, 2008. "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1362-1372, January.
    15. Loh, Wei-Yin, 1991. "Survival modeling through recursive stratification," Computational Statistics & Data Analysis, Elsevier, vol. 12(3), pages 295-313, November.
    16. Harper, Paul R., 2005. "A review and comparison of classification algorithms for medical decision making," Health Policy, Elsevier, vol. 71(3), pages 315-331, March.
    17. Fan, Juanjuan & Su, Xiao-Gang & Levine, Richard A. & Nunn, Martha E. & LeBlanc, Michael, 2006. "Trees for Correlated Survival Data by Goodness of Split, With Applications to Tooth Prognosis," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 959-967, September.
    18. Taddy, Matthew A. & Gramacy, Robert B. & Polson, Nicholas G., 2011. "Dynamic Trees for Learning and Design," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 109-123.
    19. Hemant Ishwaran & Eugene H. Blackstone & Claire E. Pothier & Michael S. Lauer, 2004. "Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 591-600, January.
    20. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    21. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    22. Xiaogang Su & Juanjuan Fan, 2004. "Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty Models," Biometrics, The International Biometric Society, vol. 60(1), pages 93-99, March.
    23. Choi, Yunhee & Ahn, Hongshik & Chen, James J., 2005. "Regression trees for analysis of count data with extra Poisson variation," Computational Statistics & Data Analysis, Elsevier, vol. 49(3), pages 893-915, June.
    24. Shih, Y. -S., 2004. "A note on split selection bias in classification trees," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 457-466, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    2. Yu-Shan Shih & Kuang-Hsun Liu, 2019. "Regression trees for detecting preference patterns from rank data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 683-702, September.
    3. Antonio D’Ambrosio & Willem J. Heiser, 2016. "A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 774-794, September.
    4. Alessandra De Rose & Alessandro Pallara, 1997. "Survival Trees: An Alternative Non-Parametric Multivariate Technique for Life History Analysis," European Journal of Population, Springer;European Association for Population Studies, vol. 13(3), pages 223-241, September.
    5. Gerhard Tutz & Moritz Berger, 2016. "Item-focussed Trees for the Identification of Items in Differential Item Functioning," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 727-750, September.
    6. Fan, Juanjuan & Nunn, Martha E. & Su, Xiaogang, 2009. "Multivariate exponential survival trees and their application to tooth prognosis," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1110-1121, February.
    7. Schmid, Lena & Gerharz, Alexander & Groll, Andreas & Pauly, Markus, 2023. "Tree-based ensembles for multi-output regression: Comparing multivariate approaches with separate univariate ones," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    8. Hsiao, Wei-Cheng & Shih, Yu-Shan, 2007. "Splitting variable selection for multivariate regression trees," Statistics & Probability Letters, Elsevier, vol. 77(3), pages 265-271, February.
    9. Tomàs Aluja-Banet & Eduard Nafria, 2003. "Stability and scalability in decision trees," Computational Statistics, Springer, vol. 18(3), pages 505-520, September.
    10. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
    11. Yan Zhou & John McArdle, 2015. "Rationale and Applications of Survival Tree and Survival Ensemble Methods," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 811-833, September.
    12. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    13. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.
    14. Keon Lee, Seong, 2005. "On generalized multivariate decision tree by using GEE," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1105-1119, June.
    15. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    16. Christophe Dutang & Quentin Guibert, 2021. "An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests," Post-Print hal-03448250, HAL.
    17. Nan-Ting Liu & Feng-Chang Lin & Yu-Shan Shih, 2020. "Count regression trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 5-27, March.
    18. Shu-Fu Kuo & Yu-Shan Shih, 2012. "Variable selection for functional density trees," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(7), pages 1387-1395, December.
    19. Dine, Abdessamad & Larocque, Denis & Bellavance, François, 2009. "Multivariate trees for mixed outcomes," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3795-3804, September.
    20. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/isiiinl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.