By the power of Grayskull : small sample statistical power in information retrieval evaluation

Laurence A. F. Park, Glenn Stone

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

Abstract

Information Retrieval evaluation is typically performed using a sample of queries and a statistical hypothesis test is used to make inferences about the systems accuracy on the population of queries. Research has shown that the t test is one of a set of tests that provides the greatest statistical power while maintaining acceptable type I error rates, when evaluating with a large sample of queries. In this article, we investigate the effect of using a small query sample on the control of the type I error rate and change in type II error rate of a given set of hypothesis tests, meaning that the hypothesis tests may not satisfy Central Limit Theorem conditions. We found that all test performed similarly for unpaired tests. We also found that the bootstrap test provided greater power for the paired test, but violated the desired type I error rate for the smallest sample size (5 queries).
Original languageEnglish
Title of host publicationProceedings of the 19th Australasian Document Computing Symposium, Melbourne, Australia, November 27-28, 2014
PublisherACM
Pages101-104
Number of pages4
ISBN (Print)9781450330008
DOIs
Publication statusPublished - 2014
EventAustralasian Document Computing Symposium -
Duration: 27 Nov 2014 → …

Conference

ConferenceAustralasian Document Computing Symposium
Period27/11/14 → …

Keywords

  • information storage and retrieval systems
  • statistical hypothesis testing
  • statistical power analysis
  • t-test (statistics)

Fingerprint

Dive into the research topics of 'By the power of Grayskull : small sample statistical power in information retrieval evaluation'. Together they form a unique fingerprint.

Cite this