IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v39y2012i4p829-850.html
   My bibliography  Save this article

An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes

Author

Listed:
  • Chun-Xia Zhang
  • Guan-Wei Wang
  • Jiang-She Zhang

Abstract

DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a classifier combination technique to construct a set of diverse base classifiers using additional artificially generated training instances. The predictions from the base classifiers are then integrated into one by the mean combination rule. In order to gain more insight about its effectiveness and advantages, this paper utilizes a large experiment to study the bias--variance analysis of DECORATE as well as some other widely used ensemble methods (such as bagging, AdaBoost, random forest) at different training sample sizes. The experimental results yield the following conclusions. For small training sets, DECORATE has a dominant advantage over its rivals and its success is attributed to the larger bias reduction achieved by it than the other algorithms. With increase in training data, AdaBoost benefits most and the bias reduced by it gradually turns to be significant while its variance reduction is also medium. Thus, AdaBoost performs best with large training samples. Moreover, random forest behaves always second best regardless of small or large training sets and it is seen to mainly decrease variance while maintaining low bias. Bagging seems to be an intermediate one since it reduces variance primarily.

Suggested Citation

  • Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.
  • Handle: RePEc:taf:japsta:v:39:y:2012:i:4:p:829-850
    DOI: 10.1080/02664763.2011.620949
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2011.620949
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2011.620949?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhang, Chun-Xia & Zhang, Jiang-She, 2008. "A local boosting algorithm for solving classification problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1928-1941, January.
    2. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    3. Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Chun-Xia & Zhang, Jiang-She & Zhang, Gai-Ying, 2009. "Using Boosting to prune Double-Bagging ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1218-1231, February.
    2. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    3. Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.
    4. Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.
    5. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1-15, March.
    6. Bernd Bischl & Julia Schiffner & Claus Weihs, 2013. "Benchmarking local classification methods," Computational Statistics, Springer, vol. 28(6), pages 2599-2619, December.
    7. Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
    8. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    9. Ivan Chang, Yuan-Chin & Huang, Yufen & Huang, Yu-Pai, 2010. "Early stopping in L2Boosting," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2203-2213, October.
    10. Sergio Davalos & Fei Leng & Ehsan H. Feroz & Zhiyan Cao, 2014. "Designing An If–Then Rules‐Based Ensemble Of Heterogeneous Bankruptcy Classifiers: A Genetic Algorithm Approach," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 21(3), pages 129-153, July.
    11. Jasdeep S. Banga & B. Wade Brorsen, 2019. "Profitability of alternative methods of combining the signals from technical trading systems," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 26(1), pages 32-45, January.
    12. Marie-Hélène Roy & Denis Larocque, 2012. "Robustness of random forests for regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 993-1006, December.
    13. Martinez, Waldyn & Gray, J. Brian, 2016. "Noise peeling methods to improve boosting algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 483-497.
    14. Tsai, Chih-Fong & Sue, Kuen-Liang & Hu, Ya-Han & Chiu, Andy, 2021. "Combining feature selection, instance selection, and ensemble classification techniques for improved financial distress prediction," Journal of Business Research, Elsevier, vol. 130(C), pages 200-209.
    15. Kesriklioğlu, Esma & Oktay, Erkan & Karaaslan, Abdulkerim, 2023. "Predicting total household energy expenditures using ensemble learning methods," Energy, Elsevier, vol. 276(C).
    16. Adler, Werner & Brenning, Alexander & Potapov, Sergej & Schmid, Matthias & Lausen, Berthold, 2011. "Ensemble classification of paired data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1933-1941, May.
    17. Mojirsheibani, Majid & Kong, Jiajie, 2016. "An asymptotically optimal kernel combined classifier," Statistics & Probability Letters, Elsevier, vol. 119(C), pages 91-100.
    18. Zhang, Mingzhu & He, Changzheng & Gu, Xin & Liatsis, Panos & Zhu, Bing, 2013. "D-GMDH: A novel inductive modelling approach in the forecasting of the industrial economy," Economic Modelling, Elsevier, vol. 30(C), pages 514-520.
    19. Xudong Hu & Hongbo Mei & Han Zhang & Yuanyuan Li & Mengdi Li, 2021. "Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 105(2), pages 1663-1689, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:39:y:2012:i:4:p:829-850. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.