Abstract
Optical information from facial movements of a talker contributes to speech perception not only when acoustic information is degraded (Sumby and Pollack 1954) or when the listener is hearing-impaired, but also when the acoustic information is clearly audible. This is most clearly shown in the classic McGurk effect or fusion illusion, in which dubbing the auditory speech syllable /ba/ onto the lip movements for /ga/ results in the emergent perception of 'da' or 'tha'. This occurs both when the observer is aware, and when the observer is unaware of the conflicting sources of information (McGurk and MacDonald 1976; McDonald and McGurk 1978). The beauty of this effect is not the fact that it results in an illusion, but that it unequivocally shows that visual information is used in speech perception even when auditory information is clear and undegraded. Thus speech perception is a multisensory event, and as such it is an exemplar of humans' and other animals' ubiquitous propensity for multisensory perception. Given that speech perception is an auditory-visual phenomenon, two intriguing questions arise: By what process does auditory-visual speech perception occur? What is the developmental course of auditory-visual speech perception; how does auditory-visual speech perception change as a function of age and type of experience? The chapter addresses these two questions, with due consideration of the appropriate research methods to be used for their resolution. The chapter concludes with a discussion of the implications of this work for automatic speech recognition.
Original language | English |
---|---|
Title of host publication | Audiovisual Speech Processing |
Editors | Gérard Bailly, Pascal Perrier, Eric Vatikiotis-Bateson |
Place of Publication | U.K |
Publisher | Cambridge University Press |
Pages | 62-75 |
Number of pages | 14 |
ISBN (Print) | 9781139379816 |
Publication status | Published - 2012 |
Keywords
- auditory perception
- speech perception
- visual perception