Abstract
This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.
Original language | English |
---|---|
Title of host publication | Handbook of Linguistic Annotation |
Editors | Nancy Ide, James Pustejovsky |
Place of Publication | Netherlands |
Publisher | Springer |
Pages | 1287-1301 |
Number of pages | 15 |
ISBN (Electronic) | 9789402408812 |
ISBN (Print) | 9789402408799 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- English language
- corpora (linguistics)
- linguistic analysis (linguistics)