Author
Listed:
- Kuen-Liang Sue
(Department of Information Management, National Central University, Taoyuan, Taiwan)
- Chih-Fong Tsai
(Department of Information Management, National Central University, Taoyuan, Taiwan)
- Tzu-Ming Yan
(Department of Information Management, National Central University, Taoyuan, Taiwan)
Abstract
Data discretisation focuses on converting continuous attribute values to discrete ones which are closer to a knowledge-level representation that is easier to understand, use, and explain than continuous values. On the other hand, instance selection aims at filtering out noisy or unrepresentative data samples from a given training dataset before constructing a learning model. In practice, some domain datasets may require processing with both discretisation and instance selection at the same time. In such cases, the order in which discretisation and instance selection are combined will result in differences in the processed datasets. For example, discretisation can be performed first based on the original dataset, after which the instance selection algorithm is used to evaluate the discrete type of data for selection, whereas the alternative is to perform instance selection first based on the continuous type of data, then using the discretiser to transfer the attribute type of values of a reduced dataset. However, this issue has not been investigated before. The aim of this paper is to compare the performance of a classifier trained and tested over datasets processed by these combination orders. Specifically, the minimum description length principle (MDLP) and ChiMerge are used for discretisation, and IB3, DROP3 and GA for instance selection. The experimental results obtained using ten different domain datasets show that executing instance selection first and discretisation second perform the best, which can be used as the guideline for the datasets that require performing both steps. In particular, combining DROP3 and MDLP can provide classification accuracy of 0.85 and AUC of 0.8, which can be regarded as the representative baseline for future related researches.
Suggested Citation
Kuen-Liang Sue & Chih-Fong Tsai & Tzu-Ming Yan, 2024.
"On Combining Instance Selection and Discretisation: A Comparative Study of Two Combination Orders,"
Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 23(05), pages 1-16, October.
Handle:
RePEc:wsi:jikmxx:v:23:y:2024:i:05:n:s0219649224500813
DOI: 10.1142/S0219649224500813
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:23:y:2024:i:05:n:s0219649224500813. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.