TY - JOUR
T1 - Fast and robust general purpose clustering algorithms
AU - Estivill-Castro, Vladimir
AU - Yang, Jianhua
PY - 2000
Y1 - 2000
N2 - General purpose and highly applicable clustering methods are required for knowledge discovery. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with k-MEANS, EM and GIBBS sampling demonstrates the advantages of our algorithms.
AB - General purpose and highly applicable clustering methods are required for knowledge discovery. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with k-MEANS, EM and GIBBS sampling demonstrates the advantages of our algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84867815153&partnerID=8YFLogxK
U2 - 10.1007/3-540-44533-1_24
DO - 10.1007/3-540-44533-1_24
M3 - Article
AN - SCOPUS:84867815153
SN - 0302-9743
VL - 1886
SP - 208
EP - 218
JO - Agents for Games and Simulations II
JF - Agents for Games and Simulations II
ER -