Author
Listed:
- Abdelkrim OUHAB
(EEDIS Laboratory, DjillaliLiabes University, Sidi Bel Abbes, Algeria)
- Mimoun MALKI
(LabRI-SBA Laboratory, Ecole Supérieure en Informatique de Sidi Bel Abbes, Sidi Bel Abbes, Algeria)
- Djamel BERRABAH
(EEDIS Laboratory, DjillaliLiabes University, Sidi Bel Abbes, Algeria)
- Faouzi BOUFARES
(LIPN Laboratory, Paris 13 University, Villetaneuse, France)
Abstract
Entity resolution (ER) is an important step in data integration and in many data mining projects; its goal is to identify records that refer to the same real-world entity. Most existing ER frameworks have focused on datasets in Latin-based languages and do not support Arabic language. In this article, the authors present an unsupervised ER framework that supports English and Arabic datasets. Rather than using matching rules developed by an expert or manually labeled training examples, the proposed framework automatically generates its own training set. The generated training set is then used to train a classifier and learn a classification model. Finally, the learned classification model is used to perform ER. The proposed framework was implemented and tested on three Arabic datasets and four English datasets. Experimental results show that the proposed framework is competitive with supervised approaches and outperform recently proposed unsupervised approaches in terms of F-measure.
Suggested Citation
Abdelkrim OUHAB & Mimoun MALKI & Djamel BERRABAH & Faouzi BOUFARES, 2017.
"An Unsupervised Entity Resolution Framework for English and Arabic Datasets,"
International Journal of Strategic Information Technology and Applications (IJSITA), IGI Global, vol. 8(4), pages 16-29, October.
Handle:
RePEc:igg:jsita0:v:8:y:2017:i:4:p:16-29
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jsita0:v:8:y:2017:i:4:p:16-29. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.