Abstract
Visual enhancement of speech intelligibility, although clearly established, still resists a clear description. We attempt to contribute to solving that problem by proposing a simple account based on phonetically motivated visual cues. This work extends a previous study quantifying the visual advantage in sentence intelligibility across three conditions with varying degrees of visual information available: auditory only, auditory visual orally masked and auditory-visual. We explore the role of lexical as well as visual factors, the latter derived from groupings in visemes. While lexical factors play an undiscriminative role across modality conditions, some measure of viseme confusability seems to capture part of the performance results. A simple characterisation of the phonetic content of sentences in terms of visual information occurring exclusively inside the mask region was found to be the strongest predictor for the auditory-visual masked condition only, demonstrating a direct link between localised visual information and auditory-visual speech processing performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing, FAAVSP 2015, 11–13 September 2015, Vienna, Austria |
Publisher | International Speech Communication Association |
Pages | 132-136 |
Number of pages | 5 |
Publication status | Published - 2015 |
Event | Joint Conference on Facial Analysis_Animation and Auditory-Visual Speech Processing - Duration: 11 Sept 2015 → … |
Conference
Conference | Joint Conference on Facial Analysis_Animation and Auditory-Visual Speech Processing |
---|---|
Period | 11/09/15 → … |
Keywords
- auditory perception
- noise
- speech perception
- visual perception