TY - JOUR
T1 - Near-optimal algorithms for instance-level constrained k-center clustering
AU - Guo, Longkun
AU - Jia, Chaoqi
AU - Liao, Kewen
AU - Lu, Zhigang
AU - Xue, Minhui
PY - 2025
Y1 - 2025
N2 - Many practical applications impose a new challenge of utilizing instance-level background knowledge (e.g., subsets of similar or dissimilar data points) within their input data to improve clustering results. In this work, we build on the widely adopted k-center clustering, modeling its input instance-level background knowledge as must-link (ML) and cannot-link (CL) constraint sets, and formulate the constrained k-center problem. Given the long-standing challenge of developing efficient algorithms for constrained clustering problems, we first derive an efficient approximation algorithm for constrained k-center at the best possible approximation ratio of 2 with linear programming (LP)-rounding technology. Recognizing the limitations of LP-rounding algorithms including high runtime complexity and challenges in parallelization, we subsequently develop a greedy algorithm that does not rely on the LP and can be efficiently parallelized. This algorithm also achieves the same approximation ratio 2 but with lower runtime complexity. Lastly, we empirically evaluate our approximation algorithm against baselines on various real datasets, validating our theoretical findings and demonstrating significant advantages of our algorithm in terms of clustering cost, quality, and runtime complexity.
AB - Many practical applications impose a new challenge of utilizing instance-level background knowledge (e.g., subsets of similar or dissimilar data points) within their input data to improve clustering results. In this work, we build on the widely adopted k-center clustering, modeling its input instance-level background knowledge as must-link (ML) and cannot-link (CL) constraint sets, and formulate the constrained k-center problem. Given the long-standing challenge of developing efficient algorithms for constrained clustering problems, we first derive an efficient approximation algorithm for constrained k-center at the best possible approximation ratio of 2 with linear programming (LP)-rounding technology. Recognizing the limitations of LP-rounding algorithms including high runtime complexity and challenges in parallelization, we subsequently develop a greedy algorithm that does not rely on the LP and can be efficiently parallelized. This algorithm also achieves the same approximation ratio 2 but with lower runtime complexity. Lastly, we empirically evaluate our approximation algorithm against baselines on various real datasets, validating our theoretical findings and demonstrating significant advantages of our algorithm in terms of clustering cost, quality, and runtime complexity.
KW - Approximation algorithm
KW - constrained clustering
KW - greedy algorithm
KW - k-center
KW - linear programming (LP)-rounding
UR - http://www.scopus.com/inward/record.url?scp=105008563510&partnerID=8YFLogxK
UR - https://go.openathens.net/redirector/westernsydney.edu.au?url=https://doi.org/10.1109/TNNLS.2025.3574268
U2 - 10.1109/TNNLS.2025.3574268
DO - 10.1109/TNNLS.2025.3574268
M3 - Article
AN - SCOPUS:105008563510
SN - 2162-237X
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -