TY - JOUR
T1 - Elephant search algorithm applied to data clustering
AU - Deb, Suash
AU - Tian, Zhonghuan
AU - Fong, Simon
AU - Wong, Raymond
AU - Millham, Richard
AU - Wong, Kelvin K. L.
PY - 2018
Y1 - 2018
N2 - Data clustering is one of the most popular branches of machine learning and data analysis. Partitioning-based type of clustering algorithms, such as K-means, is prone to the problem of producing a set of clusters that is far from perfect due to its probabilistic nature. The clustering process starts with some random partitions at the beginning, and then it attempts to improve the partitions progressively. Different initial partitions can result in different final clusters. Trying through all the possible candidate clusters for the perfect result is computationally expensive. Meta-heuristic algorithm aims to search for global optimum in high-dimensional problems. Meta-heuristic algorithm has been successfully implemented on data clustering problems seeking a near optimal solution in terms of quality of the resultant clusters. In this paper, a new meta-heuristic search method named elephant search algorithm (ESA) is proposed to integrate into K-means, forming a new data clustering algorithm, namely C-ESA. The advantage of C-ESA is its dual features of (i) evolutionary operations and (ii) balance of local intensification and global exploration. The results by C-ESA are compared with classical clustering algorithms including K-means, DBSCAN, and GMM-EM. C-ESA is shown to outperform the other algorithms in terms of clustering accuracy via a computer simulation. C-ESA is also implemented on time series clustering compared with classical algorithms K-means, Fuzzy C-means and classical meta-heuristic algorithm PSO. C-ESA outperforms the other algorithms in term of clustering accuracy. C-ESA is still comparable compared with state of art time series clustering algorithm K-shape.
AB - Data clustering is one of the most popular branches of machine learning and data analysis. Partitioning-based type of clustering algorithms, such as K-means, is prone to the problem of producing a set of clusters that is far from perfect due to its probabilistic nature. The clustering process starts with some random partitions at the beginning, and then it attempts to improve the partitions progressively. Different initial partitions can result in different final clusters. Trying through all the possible candidate clusters for the perfect result is computationally expensive. Meta-heuristic algorithm aims to search for global optimum in high-dimensional problems. Meta-heuristic algorithm has been successfully implemented on data clustering problems seeking a near optimal solution in terms of quality of the resultant clusters. In this paper, a new meta-heuristic search method named elephant search algorithm (ESA) is proposed to integrate into K-means, forming a new data clustering algorithm, namely C-ESA. The advantage of C-ESA is its dual features of (i) evolutionary operations and (ii) balance of local intensification and global exploration. The results by C-ESA are compared with classical clustering algorithms including K-means, DBSCAN, and GMM-EM. C-ESA is shown to outperform the other algorithms in terms of clustering accuracy via a computer simulation. C-ESA is also implemented on time series clustering compared with classical algorithms K-means, Fuzzy C-means and classical meta-heuristic algorithm PSO. C-ESA outperforms the other algorithms in term of clustering accuracy. C-ESA is still comparable compared with state of art time series clustering algorithm K-shape.
KW - cluster analysis
KW - data
KW - heuristic algorithms
UR - http://handle.westernsydney.edu.au:8081/1959.7/uws:45982
U2 - 10.1007/s00500-018-3076-2
DO - 10.1007/s00500-018-3076-2
M3 - Article
SN - 1432-7643
VL - 22
SP - 6035
EP - 6046
JO - Soft Computing
JF - Soft Computing
IS - 18
ER -