Weighted kernel model for text categorization

Lei Zhang, Debbie Zhang, Simeon J. Simoff, John Debenham

Research output: Contribution to journalConference articlepeer-review

5 Citations (Scopus)

Abstract

Traditional bag-of-words model and recent wordsequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word order, which could result in less computation accuracy for some types of documents. Word-sequence kernel takes into account word order, but does not include all information of the word frequency. A weighted kernel model that combines these two models was proposed by the authors [1]. This paper is focused on the optimization of the weighting parameters, which are functions of word frequency. Experiments have been conducted with Reuter's database and show that the new weighted kernel achieves better classification accuracy.

Original languageEnglish
Pages (from-to)111-114
Number of pages4
JournalConferences in Research and Practice in Information Technology Series
Volume61
Publication statusPublished - 2006
Externally publishedYes
Event5th Australasian Data Mining Conference, AusDM 2006 - Sydney, NSW, Australia
Duration: 29 Nov 200630 Nov 2006

Keywords

  • Bag-of-words Kernel
  • Text categorization
  • Weighted kernel model
  • Word-sequence Kernel

Fingerprint

Dive into the research topics of 'Weighted kernel model for text categorization'. Together they form a unique fingerprint.

Cite this