Fast and robust general purpose clustering algorithms

Vladimir Estivill-Castro, Jianhua Yang

Research output: Contribution to journalArticlepeer-review

50 Citations (Scopus)

Abstract

General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multidimensional but is more robust to noise and outliers. We achieve this by using medians rather than means as estimators for the centers of clusters. Comparison with k-MEANS, EXPECTATION MAXIMIZATION and GIBBS sampling demonstrates the advantages of our algorithm.

Original languageEnglish
Pages (from-to)127-150
Number of pages24
JournalData Mining and Knowledge Discovery
Volume8
Issue number2
DOIs
Publication statusPublished - Mar 2004

Keywords

  • 1-median problem
  • Clustering
  • Combinatorial optimization
  • Expectation maximization
  • k-Means
  • Medoids

Fingerprint

Dive into the research topics of 'Fast and robust general purpose clustering algorithms'. Together they form a unique fingerprint.

Cite this