IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i4p1208-1217.html
   My bibliography  Save this article

A similarity measure to assess the stability of classification trees

Author

Listed:
  • Briand, Bénédicte
  • Ducharme, Gilles R.
  • Parache, Vanessa
  • Mercat-Rommens, Catherine

Abstract

It has been recognized that Classification trees (CART) are unstable; a small perturbation in the input variables or a fresh sample can lead to a very different classification tree. Some approaches exist that try to correct this instability. However, their benefits can, at present, be appreciated only qualitatively. A similarity measure between two classification trees is introduced that can measure their closeness. Its usefulness is illustrated with synthetic data on the impact of radioactivity deposit through the environment. In this context, a modified node level stabilizing technique, referred to as the NLS-REP method, is introduced and shown to be more stable than the classical CART method.

Suggested Citation

  • Briand, Bénédicte & Ducharme, Gilles R. & Parache, Vanessa & Mercat-Rommens, Catherine, 2009. "A similarity measure to assess the stability of classification trees," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1208-1217, February.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1208-1217
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00497-0
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Karolis Matikonis & Matthew Gobey, 2024. "Small Business Property Tax Reductions and Firm Productivity," Small Business Economics, Springer, vol. 62(1), pages 307-324, January.
    2. Aniek Sies & Iven Mechelen, 2020. "C443: a Methodology to See a Forest for the Trees," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 730-753, October.
    3. Piccarreta, Raffaella, 2010. "Binary trees for dissimilarity data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1516-1524, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lamperti, Francesco & Roventini, Andrea & Sani, Amir, 2018. "Agent-based model calibration using machine learning surrogates," Journal of Economic Dynamics and Control, Elsevier, vol. 90(C), pages 366-389.
    2. Mohamed Zine & Fouzi Harrou & Mohammed Terbeche & Mohammed Bellahcene & Abdelkader Dairi & Ying Sun, 2023. "E-Learning Readiness Assessment Using Machine Learning Methods," Sustainability, MDPI, vol. 15(11), pages 1-22, June.
    3. Yigit Aydede & Jan Ditzen, 2022. "Identifying the regional drivers of influenza-like illness in Nova Scotia with dominance analysis," Papers 2212.06684, arXiv.org.
    4. De Bock, Koen W. & Coussement, Kristof & Van den Poel, Dirk, 2010. "Ensemble classification based on generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1535-1546, June.
    5. Ollech, Daniel & Webel, Karsten, 2020. "A random forest-based approach to identifying the most informative seasonality tests," Discussion Papers 55/2020, Deutsche Bundesbank.
    6. Ilias Thomas & Alex M. Dickens & Jussi P. Posti & Endre Czeiter & Daniel Duberg & Tim Sinioja & Matilda Kråkström & Isabel R. A. Retel Helmrich & Kevin K. W. Wang & Andrew I. R. Maas & Ewout W. Steyer, 2022. "Serum metabolome associated with severity of acute traumatic brain injury," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Lu, Xuefei & Baraldi, Piero & Zio, Enrico, 2020. "A data-driven framework for identifying important components in complex systems," Reliability Engineering and System Safety, Elsevier, vol. 204(C).
    8. Mahyar Jahaninasab & Ehsan Taheran & S. Alireza Zarabadi & Mohammadreza Aghaei & Ali Rajabpour, 2023. "A Novel Approach for Reducing Feature Space Dimensionality and Developing a Universal Machine Learning Model for Coated Tubes in Cross-Flow Heat Exchangers," Energies, MDPI, vol. 16(13), pages 1-13, July.
    9. repec:hal:spmain:info:hdl:2441/20hflp7eqn97boh50no50tv67n is not listed on IDEAS
    10. Junqi Wang & Rundong Liu & Linfeng Zhang & Hussain Syed ASAD & Erlin Meng, 2019. "Triggering Optimal Control of Air Conditioning Systems by Event-Driven Mechanism: Comparing Direct and Indirect Approaches," Energies, MDPI, vol. 12(20), pages 1-20, October.
    11. Liu, Yehong & Yin, Guosheng, 2020. "The Delaunay triangulation learner and its ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    12. Ha, Tran Vinh & Asada, Takumi & Arimura, Mikiharu, 2019. "Determination of the influence factors on household vehicle ownership patterns in Phnom Penh using statistical and machine learning methods," Journal of Transport Geography, Elsevier, vol. 78(C), pages 70-86.
    13. Lamperti, Francesco & Roventini, Andrea & Sani, Amir, 2018. "Agent-based model calibration using machine learning surrogates," Journal of Economic Dynamics and Control, Elsevier, vol. 90(C), pages 366-389.
    14. Jia Geng & Mingsheng Yuan & Shen Xu & Tingting Bai & Yang Xiao & Xiaopeng Li & Dong Xu, 2022. "Urban Expansion Was the Main Driving Force for the Decline in Ecosystem Services in Hainan Island during 1980–2015," IJERPH, MDPI, vol. 19(23), pages 1-18, November.
    15. Ingrida Vaiciulyte & Zivile Kalsyte & Leonidas Sakalauskas & Darius Plikynas, 2017. "Assessment of market reaction on the share performance on the basis of its visualization in 2D space," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 18(2), pages 309-318, March.
    16. Danielle Baghernejad, 2017. "Class Based Variable Importance for Medical Decision Making," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 1(5), pages 1328-1335, October.
    17. Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
    18. Benjamin David, 2017. "Model economic phenomena with CART and Random Forest algorithms," Working Papers hal-04141619, HAL.
    19. Chandler Gabriel & Stevens Guy, 2012. "An Exploratory Study of Minor League Baseball Statistics," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(4), pages 1-28, November.
    20. Mohammad Abdullah & Mohammad Ashraful Ferdous Chowdhury & Ajim Uddin & Syed Moudud‐Ul‐Huq, 2023. "Forecasting nonperforming loans using machine learning," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(7), pages 1664-1689, November.
    21. Benjamin David, 2017. "Model economic phenomena with CART and Random Forest algorithms," EconomiX Working Papers 2017-46, University of Paris Nanterre, EconomiX.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1208-1217. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.