Clustering web visitors by bast, robust and convergent algorithms

Vladimir Estivill-Castro, Jianhua Yang

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

The identification of categories of visitors to a Web-site is very useful towards improved Web designs and improved Web applications. However, the large volume involved in mining access-logs and visitation paths, and the uncertainty to fully identify the visitor demand efficient clustering algorithms that are also resistant to noise and outliers. Also, visitation paths are discrete, and dissimilarity between visitation paths involves sophisticated evaluation and results in attribute-vectors with large dimension. We provide randomized, iterative clustering algorithms for generic dissimilarity in paths. Our algorithms are robust because they use medians rather than means as estimators of location, and the resulting representative of a cluster is actually a path in the data set. We demonstrate mathematically that our algorithms converge and have subquadratic complexity. We also show experimentally that they are resistant to noise by recovering clusters from synthetic data generated by a mixture of distributions of paths in a graph. Our non-crisp method proposed generalizes approaches that allow a data item to have a degree of membership in a cluster.

Original languageEnglish
Pages (from-to)497-520
Number of pages24
JournalInternational Journal of Foundations of Computer Science
Volume13
Issue number4
DOIs
Publication statusPublished - 2002
Externally publishedYes

Keywords

  • Clustering
  • Dissimilarity
  • Visitation paths
  • Web-User Mining

Fingerprint

Dive into the research topics of 'Clustering web visitors by bast, robust and convergent algorithms'. Together they form a unique fingerprint.

Cite this