Author
Listed:
- M. A. Balafar
- R. Hazratgholizadeh
- M. R. F. Derakhshi
Abstract
Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.
Suggested Citation
M. A. Balafar & R. Hazratgholizadeh & M. R. F. Derakhshi, 2020.
"Active Learning for Constrained Document Clustering with Uncertainty Region,"
Complexity, Hindawi, vol. 2020, pages 1-16, May.
Handle:
RePEc:hin:complx:3207306
DOI: 10.1155/2020/3207306
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:complx:3207306. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.