Efficient storage and retrieval of probabilistic latent semantic information for information retrieval

Laurence A. F. Park, Kotagiri Ramamohanarao

    Research output: Contribution to journalArticle

    20 Citations (Scopus)

    Abstract

    Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of the storage space of PLSI.
    Original languageEnglish
    Pages (from-to)141-155
    Number of pages15
    JournalVLDB Journal
    Volume18
    Issue number1
    DOIs
    Publication statusPublished - 2009

    Keywords

    • information retrieval
    • search engines

    Fingerprint

    Dive into the research topics of 'Efficient storage and retrieval of probabilistic latent semantic information for information retrieval'. Together they form a unique fingerprint.

    Cite this