Abstract
![CDATA[Information Retrieval evaluation is typically performed using a sample of queries and a statistical hypothesis test is used to make inferences about the systems accuracy on the population of queries. Research has shown that the t test is one of a set of tests that provides the greatest statistical power while maintaining acceptable type I error rates, when evaluating with a large sample of queries. In this article, we investigate the effect of using a small query sample on the control of the type I error rate and change in type II error rate of a given set of hypothesis tests, meaning that the hypothesis tests may not satisfy Central Limit Theorem conditions. We found that all test performed similarly for unpaired tests. We also found that the bootstrap test provided greater power for the paired test, but violated the desired type I error rate for the smallest sample size (5 queries).]]
Original language | English |
---|---|
Title of host publication | Proceedings of the 19th Australasian Document Computing Symposium, Melbourne, Australia, November 27-28, 2014 |
Publisher | ACM |
Pages | 101-104 |
Number of pages | 4 |
ISBN (Print) | 9781450330008 |
DOIs | |
Publication status | Published - 2014 |
Event | Australasian Document Computing Symposium - Duration: 27 Nov 2014 → … |
Conference
Conference | Australasian Document Computing Symposium |
---|---|
Period | 27/11/14 → … |
Keywords
- information storage and retrieval systems
- statistical hypothesis testing
- statistical power analysis
- t-test (statistics)