Vocalisations with a better view : hyperarticulation augments the auditory-visual advantage for the detection of speech in noise

  • Nicole C. Lees

Western Sydney University thesis: Doctoral thesis

Abstract

Recent studies have shown that there is a visual influence early in speech processing - visual speech enhances the ability to detect auditory speech in noise. However, identifying exactly how visual speech interacts with auditory processing at such an early stage has been challenging, because this so-called AV speech detection advantage is both highly related to a specific lower-order, signal-based, optic-acoustic relationship between the second formant amplitude and the area of the mouth (F2/Mouth-area), and mediated by higher-order, information-based factors. Previous investigations either have maximised or minimised information-based factors, or have minimised signal-based factors, in order to try to tease out the relative importance of these sources of the advantage, but they have not yet been successful in this endeavour. Maximising signal-based factors has not previously been explored. This avenue was explored in this thesis by manipulating speaking style, hyperarticulated speech was used to maximise signal-based factors, and hypoarticulated speech to minimise signal-based factors - to examine whether the AV speech detection advantage is modified by these means, and to provide a clearer idea of the primary source of visual influence in the AV detection advantage. Two sets of six studies were conducted. In the first set, three recorded speech styles, hyperarticulated, normal, and hypoarticulated, were extensively analysed in physical (optic and acoustic) and perceptual (visual and auditory) dimensions ahead of stimulus selection for the second set of studies. The analyses indicated that the three styles comprise distinctive categories on the Hyper-Hypo continuum of articulatory effort (Lindblom, 1990). Most relevantly, both optically and visually hyperarticulated speech was more informative, and hypoarticulated less informative, than normal speech with regard to signal-based movement factors. However, the F2/Mouth-area correlation was similarly strong for all speaking styles, thus allowing examination of signal-based, visual informativeness on AV speech detection with optic-acoustic association controlled. In the second set of studies, six Detection Experiments incorporating the three speaking styles were designed to examine whether, and if so why, more visually-informative (hyperarticulated) speech augmented, and less visually informative (hypoarticulated) speech attenuated, the AV detection advantage relative to normal speech, and to examine visual influence when auditory speech was absent. Detection Experiment 1 used a two-interval, two-alternative (first or second interval, 2I2AFC) detection task, and indicated that hyperarticulation provided an AV detection advantage greater than for normal and hypoarticulated speech, with less of an advantage for hypoarticulated than for normal speech. Detection Experiment 2 used a single-interval, yes-no detection task to assess responses in signal-absent independent of signal-present conditions as a means of addressing participants' reports that speech was heard when it was not presented in the 2I2AFC task. Hyperarticulation resulted in an AV detection advantage, and for all speaking styles there was a consistent response bias to indicate speech was present in signal-absent conditions. To examine whether the AV detection advantage for hyperarticulation was due to visual, auditory or auditory-visual factors, Detection Experiments 3 and 4 used mismatching AV speaking style combinations (AnormVhyper, AnormVhypo, AhyperVnorm, AhypoVnorm) that were onset-matched or time-aligned, respectively. The results indicated that higher rates of mouth movement can be sufficient for the detection advantage with weak optic-acoustic associations, but, in circumstances where these associations are low, even high rates of movement have little impact on augmenting detection in noise. Furthermore, in Detection Experiment 5, in which visual stimuli consisted only of the mouth movements extracted from the three styles, there was no AV detection advantage, and it seems that this is so because extra-oral information is required, perhaps to provide a frame of reference that improves the availability of mouth movement to the perceiver. Detection Experiment 6 used a new 2I-4AFC task and the measures of false detections and response bias to identify whether visual influence in signal absent conditions is due to response bias or an illusion of hearing speech in noise (termed here the Speech in Noise, SiN, Illusion). In the event, the SiN illusion occurred for both the hyperarticulated and the normal styles - styles with reasonable amounts of movement change. For normal speech, the responses in signal-absent conditions were due only to the illusion of hearing speech in noise, whereas for hypoarticulated speech such responses were due only to response bias. For hyperarticulated speech there is evidence for the presence of both types of visual influence in signal-absent conditions. It seems to be the case that there is more doubt with regard to the presence of auditory speech for non-normal speech styles. An explanation of past and present results is offered within a new framework -the Dynamic Bimodal Accumulation Theory (DBAT). This is developed in this thesis to address the limitations of, and conflicts between, previous theoretical positions. DBAT suggests a bottom-up influence of visual speech on the processing of auditory speech; specifically, it is proposed that the rate of change of visual movements guides auditory attention rhythms 'on-line' at corresponding rates, which allows selected samples of the auditory stream to be given prominence. Any patterns contained within these samples then emerge from the course of auditory integration processes. By this account, there are three important elements of visual speech necessary for enhanced detection of speech in noise. First and foremost, when speech is present, visual movement information must be available (as opposed to hypoarticulated and synthetic speech) Then the rate of change, and opticacoustic relatedness also have an impact (as in Detection Experiments 3 and 4). When speech is absent, visual information has an influence; and the SiN illusion (Detection Experiment 6) can be explained as a perceptual modulation of a noise stimulus by visually-driven rhythmic attention. In sum, hyperarticulation augments the AV speech detection advantage, and, whenever speech is perceived in noisy conditions, there is either response bias to perceive speech or a SiN illusion, or both. DBAT provides a detailed description of these results, with wider-ranging explanatory power than previous theoretical accounts. Predictions are put forward for examination of the predictive power of DBAT in future studies.
Date of Award2007
Original languageEnglish

Keywords

  • speech perception
  • auditory perception
  • lipreading
  • visual perception
  • noise

Cite this

'