Confidence intervals for information retrieval evaluation

    Research output: Chapter in Book / Conference PaperConference Paperpeer-review

    2 Citations (Scopus)

    Abstract

    Information retrieval results are currently limited to the publication in which they exist. Significance tests are used to remove the dependence of the evaluation on the query sample, but the findings cannot be transferred to other systems not involved in the test. Confidence intervals for the population parameters provide query independent results and give insight to how each system is expected to behave when queried. Confidence intervals also allow the reader to compare results across articles because they provide the possible location of a systems population parameter. Unfortunately, we can only construct confidence intervals of population parameters if we have knowledge of the evaluation score distribution for each system. In this article, we investigate the distribution of Average Precision of a set of systems and examine if we can construct confidence intervals for the population mean Average Precision with a given level of confidence. We found that by standardising the scores, the system score distribution and system score sample mean distribution was approximately Normal for all systems, allowing us to construct accurate confidence intervals for the population mean Average Precision.
    Original languageEnglish
    Title of host publicationADCS 2010: Proceedings of the Fifteenth Australasian Document Computing Symposium, Melbourne, 10 December 2010
    PublisherRMIT University
    Pages97-104
    Number of pages8
    ISBN (Print)9781921426803
    Publication statusPublished - 2010
    EventAustralasian Document Computing Symposium -
    Duration: 5 Dec 2013 → …

    Conference

    ConferenceAustralasian Document Computing Symposium
    Period5/12/13 → …

    Fingerprint

    Dive into the research topics of 'Confidence intervals for information retrieval evaluation'. Together they form a unique fingerprint.

    Cite this