TY - JOUR
T1 - Categorizing visitors dynamically by fast and robust clustering of access logs
AU - Estivill-Castro, Vladimir
AU - Yang, Jianhua
PY - 2001
Y1 - 2001
N2 - Clustering plays a central role in segmenting markets. The identification of categories of visitors to a Web-site is very useful towards improved Web applications. However, the large volume involved in mining visitation paths, demands efficient clustering algorithms that are also resistant to noise and outliers. Also, dissimilarity between visitation paths involves sophisticated evaluation and results in large dimension of attribute-vectors. We present a randomized, iterative algorithm (a la Expectation Maximization or k-means) but based on discrete medoids. We prove that our algorithm converges and that has subquadratic complexity. We compare to the implementation of the fastest version of matrixbased clustering for visitor paths and show that our algorithm outperforms dramatically matrix-based methods.
AB - Clustering plays a central role in segmenting markets. The identification of categories of visitors to a Web-site is very useful towards improved Web applications. However, the large volume involved in mining visitation paths, demands efficient clustering algorithms that are also resistant to noise and outliers. Also, dissimilarity between visitation paths involves sophisticated evaluation and results in large dimension of attribute-vectors. We present a randomized, iterative algorithm (a la Expectation Maximization or k-means) but based on discrete medoids. We prove that our algorithm converges and that has subquadratic complexity. We compare to the implementation of the fastest version of matrixbased clustering for visitor paths and show that our algorithm outperforms dramatically matrix-based methods.
UR - http://www.scopus.com/inward/record.url?scp=26344435844&partnerID=8YFLogxK
U2 - 10.1007/3-540-45490-x_64
DO - 10.1007/3-540-45490-x_64
M3 - Article
AN - SCOPUS:26344435844
SN - 0302-9743
VL - 2198
SP - 498
EP - 507
JO - Agents for Games and Simulations II
JF - Agents for Games and Simulations II
ER -