IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v8y2006i3d10.1007_s10796-006-8779-8.html
   My bibliography  Save this article

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Author

Listed:
  • Kweku-Muata Osei-Bryson

    (Virginia Commonwealth University)

  • Kendall Giles

    (Virginia Commonwealth University)

Abstract

Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets.

Suggested Citation

  • Kweku-Muata Osei-Bryson & Kendall Giles, 2006. "Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families," Information Systems Frontiers, Springer, vol. 8(3), pages 195-209, July.
  • Handle: RePEc:spr:infosf:v:8:y:2006:i:3:d:10.1007_s10796-006-8779-8
    DOI: 10.1007/s10796-006-8779-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-006-8779-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-006-8779-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shaoyan Zhang & Christos Tjortjis & Xiaojun Zeng & Hong Qiao & Iain Buchan & John Keane, 2009. "Comparing data mining methods with logistic regression in childhood obesity prediction," Information Systems Frontiers, Springer, vol. 11(4), pages 449-460, September.
    2. Francis Kofi Andoh-Baidoo & Kweku-Muata Osei-Bryson & Kwasi Amoako-Gyampah, 2012. "Effects of firm and IT characteristics on the value of e-commerce initiatives: An inductive theoretical framework," Information Systems Frontiers, Springer, vol. 14(2), pages 237-259, April.
    3. Chulhwan Chris Bang, 2015. "Information systems frontiers: Keyword analysis and classification," Information Systems Frontiers, Springer, vol. 17(1), pages 217-237, February.
    4. Gunjan Mansingh & Lila Rao & Kweku-Muata Osei-Bryson & Annette Mills, 2015. "Profiling internet banking users: A knowledge discovery in data mining process model based approach," Information Systems Frontiers, Springer, vol. 17(1), pages 193-215, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:8:y:2006:i:3:d:10.1007_s10796-006-8779-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.