IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i12p4046-4072.html
   My bibliography  Save this article

Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Author

Listed:
  • Rokach, Lior

Abstract

Ensemble methodology, which builds a classification model by integrating multiple classifiers, can be used for improving prediction performance. Researchers from various disciplines such as statistics, pattern recognition, and machine learning have seriously explored the use of ensemble methodology. This paper presents an updated survey of ensemble methods in classification tasks, while introducing a new taxonomy for characterizing them. The new taxonomy, presented from the algorithm designer's point of view, is based on five dimensions: inducer, combiner, diversity, size, and members' dependency. We also propose several selection criteria, presented from the practitioner's point of view, for choosing the most suitable ensemble method.

Suggested Citation

  • Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4046-4072
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00263-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Croux, Christophe & Joossens, Kristel & Lemmens, Aurelie, 2007. "Trimmed bagging," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 362-368, September.
    2. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
    3. Drucker, Harris, 2002. "Effect of pruning and early stopping on performance of a boosting ensemble," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 393-406, February.
    4. Buttrey, Samuel E. & Karo, Ciril, 2002. "Using k-nearest-neighbor classification in the leaves of a tree," Computational Statistics & Data Analysis, Elsevier, vol. 40(1), pages 27-37, July.
    5. Sexton, Joseph & Laake, Petter, 2008. "LogitBoost with errors-in-variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2549-2559, January.
    6. Kim, Yuwon & Koo, Ja-Yong, 2005. "Inverse boosting for monotone regression functions," Computational Statistics & Data Analysis, Elsevier, vol. 49(3), pages 757-770, June.
    7. Ahn, Hongshik & Moon, Hojin & Fazzari, Melissa J. & Lim, Noha & Chen, James J. & Kodell, Ralph L., 2007. "Classification by ensembles from random partitions of high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6166-6179, August.
    8. Moskovitch, Robert & Elovici, Yuval & Rokach, Lior, 2008. "Detection of unknown computer worms based on behavioral classification of the host," Computational Statistics & Data Analysis, Elsevier, vol. 52(9), pages 4544-4566, May.
    9. Menahem, Eitan & Shabtai, Asaf & Rokach, Lior & Elovici, Yuval, 2009. "Improving malware detection by applying multi-inducer ensemble," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1483-1494, February.
    10. Zhang, Chun-Xia & Zhang, Jiang-She, 2008. "A local boosting algorithm for solving classification problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1928-1941, January.
    11. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    12. Rokach, Lior, 2009. "Collective-agreement-based pruning of ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1015-1026, February.
    13. Ridgeway, Greg, 2002. "Looking for lumps: boosting and bagging for density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 379-392, February.
    14. Yuval Elovici & Bracha Shapira & Paul B. Kantor, 2006. "A decision theoretic approach to combining information filters: An analytical and empirical evaluation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 306-320, February.
    15. Merler, Stefano & Caprile, Bruno & Furlanello, Cesare, 2007. "Parallelizing AdaBoost by weights dynamics," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2487-2498, February.
    16. Adem, Jan & Gochet, Willy, 2004. "Aggregating classifiers with mathematical programming," Computational Statistics & Data Analysis, Elsevier, vol. 47(4), pages 791-807, November.
    17. Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
    18. Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.
    19. Christmann, Andreas & Steinwart, Ingo & Hubert, Mia, 2007. "Robust learning from bites for data mining," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 347-361, September.
    20. Denison, D. G. T. & Adams, N. M. & Holmes, C. C. & Hand, D. J., 2002. "Bayesian partition modelling," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 475-485, February.
    21. Gey, Servane & Poggi, Jean-Michel, 2006. "Boosting and instability for regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 533-550, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marie-Hélène Roy & Denis Larocque, 2012. "Robustness of random forests for regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 993-1006, December.
    2. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    3. Xudong Hu & Hongbo Mei & Han Zhang & Yuanyuan Li & Mengdi Li, 2021. "Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 105(2), pages 1663-1689, January.
    4. Tsai, Chih-Fong & Sue, Kuen-Liang & Hu, Ya-Han & Chiu, Andy, 2021. "Combining feature selection, instance selection, and ensemble classification techniques for improved financial distress prediction," Journal of Business Research, Elsevier, vol. 130(C), pages 200-209.
    5. Sergio Davalos & Fei Leng & Ehsan H. Feroz & Zhiyan Cao, 2014. "Designing An If–Then Rules‐Based Ensemble Of Heterogeneous Bankruptcy Classifiers: A Genetic Algorithm Approach," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 21(3), pages 129-153, July.
    6. Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.
    7. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
    8. Kesriklioğlu, Esma & Oktay, Erkan & Karaaslan, Abdulkerim, 2023. "Predicting total household energy expenditures using ensemble learning methods," Energy, Elsevier, vol. 276(C).
    9. Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
    10. Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.
    11. Mojirsheibani, Majid & Kong, Jiajie, 2016. "An asymptotically optimal kernel combined classifier," Statistics & Probability Letters, Elsevier, vol. 119(C), pages 91-100.
    12. Adler, Werner & Brenning, Alexander & Potapov, Sergej & Schmid, Matthias & Lausen, Berthold, 2011. "Ensemble classification of paired data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1933-1941, May.
    13. Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Chun-Xia & Zhang, Jiang-She & Zhang, Gai-Ying, 2009. "Using Boosting to prune Double-Bagging ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1218-1231, February.
    2. Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.
    3. De Bock, Koen W. & Coussement, Kristof & Van den Poel, Dirk, 2010. "Ensemble classification based on generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1535-1546, June.
    4. Adler, Werner & Lausen, Berthold, 2009. "Bootstrap estimated true and false positive rates and ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 718-729, January.
    5. Ollech, Daniel & Webel, Karsten, 2020. "A random forest-based approach to identifying the most informative seasonality tests," Discussion Papers 55/2020, Deutsche Bundesbank.
    6. Stefan Lessmann & Stefan Voß, 2010. "Customer-Centric Decision Support," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 2(2), pages 79-93, April.
    7. Martinez, Waldyn & Gray, J. Brian, 2016. "Noise peeling methods to improve boosting algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 483-497.
    8. Rokach, Lior, 2009. "Collective-agreement-based pruning of ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1015-1026, February.
    9. Chung, Dongjun & Kim, Hyunjoong, 2015. "Accurate ensemble pruning with PL-bagging," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 1-13.
    10. Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
    11. Croux, Christophe & Joossens, Kristel & Lemmens, Aurelie, 2007. "Trimmed bagging," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 362-368, September.
    12. Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.
    13. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    14. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    15. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    16. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    17. Kusiak, Andrew & Zheng, Haiyang & Song, Zhe, 2009. "On-line monitoring of power curves," Renewable Energy, Elsevier, vol. 34(6), pages 1487-1493.
    18. Lamperti, Francesco & Roventini, Andrea & Sani, Amir, 2018. "Agent-based model calibration using machine learning surrogates," Journal of Economic Dynamics and Control, Elsevier, vol. 90(C), pages 366-389.
    19. Zhu, Siying & Zhu, Feng, 2019. "Cycling comfort evaluation with instrumented probe bicycle," Transportation Research Part A: Policy and Practice, Elsevier, vol. 129(C), pages 217-231.
    20. Dursun Delen & Hamed M. Zolbanin & Durand Crosby & David Wright, 2021. "To imprison or not to imprison: an analytics model for drug courts," Annals of Operations Research, Springer, vol. 303(1), pages 101-124, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4046-4072. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.