IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i19p2390-d643269.html
   My bibliography  Save this article

Improved Constrained k -Means Algorithm for Clustering with Domain Knowledge

Author

Listed:
  • Peihuang Huang

    (College of Mathematics and Data Science, Minjiang University, Fuzhou 350116, China
    These authors contributed equally to this work.)

  • Pei Yao

    (College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
    These authors contributed equally to this work.)

  • Zhendong Hao

    (College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
    These authors contributed equally to this work.)

  • Huihong Peng

    (College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
    These authors contributed equally to this work.)

  • Longkun Guo

    (School of Computer Science, Qilu University of Technology, Jinan 250353, China
    These authors contributed equally to this work.)

Abstract

Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with k -means for better accuracy. We first propose an algorithm for constrained k -means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained k -means as well as the traditional k -means regarding the clustering accuracy rate although with a slightly compromised practical runtime.

Suggested Citation

  • Peihuang Huang & Pei Yao & Zhendong Hao & Huihong Peng & Longkun Guo, 2021. "Improved Constrained k -Means Algorithm for Clustering with Domain Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-14, September.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2390-:d:643269
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/19/2390/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/19/2390/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Min Li, 0. "The bi-criteria seeding algorithms for two variants of k-means problem," Journal of Combinatorial Optimization, Springer, vol. 0, pages 1-12.
    2. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    3. Min Li & Dachuan Xu & Jun Yue & Dongmei Zhang & Peng Zhang, 2020. "The seeding algorithm for k-means problem with penalties," Journal of Combinatorial Optimization, Springer, vol. 39(1), pages 15-32, January.
    4. Min Li & Dachuan Xu & Jun Yue & Dongmei Zhang, 2020. "The Parallel Seeding Algorithm for k-Means Problem with Penalties," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 37(04), pages 1-18, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaoyun Tian & Dachuan Xu & Donglei Du & Ling Gai, 2022. "The spherical k-means++ algorithm via local search scheme," Journal of Combinatorial Optimization, Springer, vol. 44(4), pages 2375-2394, November.
    2. Min Li, 0. "The bi-criteria seeding algorithms for two variants of k-means problem," Journal of Combinatorial Optimization, Springer, vol. 0, pages 1-12.
    3. Min Li, 2022. "The bi-criteria seeding algorithms for two variants of k-means problem," Journal of Combinatorial Optimization, Springer, vol. 44(3), pages 1693-1704, October.
    4. İsmail Güzel & Atabey Kaygun, 2022. "A new non-archimedean metric on persistent homology," Computational Statistics, Springer, vol. 37(4), pages 1963-1983, September.
    5. Sai Ji & Dachuan Xu & Longkun Guo & Min Li & Dongmei Zhang, 2022. "The seeding algorithm for spherical k-means clustering with penalties," Journal of Combinatorial Optimization, Springer, vol. 44(3), pages 1977-1994, October.
    6. Sai Ji & Dachuan Xu & Longkun Guo & Min Li & Dongmei Zhang, 0. "The seeding algorithm for spherical k-means clustering with penalties," Journal of Combinatorial Optimization, Springer, vol. 0, pages 1-18.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2390-:d:643269. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.