Efficient Algorithm for the k-Means Problem with Must-Link and Cannot-Link Constraints

Chaoqi Jia, Longkun Guo, Kewen Liao, Zhigang Lu

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
1 Downloads (Pure)

Abstract

Constrained clustering, such as k-means with instance-level Must-Link (ML) and Cannot-Link (CL) auxiliary information as the constraints, has been extensively studied recently, due to its broad applications in data science and AI. Despite some heuristic approaches, there has not been any algorithm providing a non-trivial approximation ratio to the constrained k-means problem. To address this issue, we propose an algorithm with a provable approximation ratio of Ok when only ML constraints are considered. We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints. The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy.

Original languageEnglish
Pages (from-to)1050-1062
Number of pages13
JournalTsinghua Science and Technology
Volume28
Issue number6
DOIs
Publication statusPublished - 1 Dec 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1996-2012 Tsinghua University Press.

Fingerprint

Dive into the research topics of 'Efficient Algorithm for the k-Means Problem with Must-Link and Cannot-Link Constraints'. Together they form a unique fingerprint.

Cite this