The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers

Saya Kawase, Beverly Hannah, Yue Wang

    Research output: Contribution to journalArticlepeer-review

    Abstract

    This study examines how visual speech information affects native judgments of the intelligibility of speech sounds produced by non-native (L2) speakers. Native Canadian English perceivers as judges perceived three English phonemic contrasts (/b-v, θ-s, l-ɹ/) produced by native Japanese speakers as well as native Canadian English speakers as controls. These stimuli were presented under audio-visual (AV, with speaker voice and face), audio-only (AO), and visual-only (VO) conditions. The results showed that, across conditions, the overall intelligibility of Japanese productions of the native (Japanese)-like phonemes (/b, s, l/) was significantly higher than the non-Japanese phonemes (/v, θ, ɹ/). In terms of visual effects, the more visually salient non-Japanese phonemes /v, θ/ were perceived as significantly more intelligible when presented in the AV compared to the AO condition, indicating enhanced intelligibility when visual speech information is available. However, the non-Japanese phoneme /ɹ/ was perceived as less intelligible in the AV compared to the AO condition. Further analysis revealed that, unlike the native English productions, the Japanese speakers produced /ɹ/ without visible lip-rounding, indicating that non-native speakers' incorrect articulatory configurations may decrease the degree of intelligibility. These results suggest that visual speech information may either positively or negatively affect L2 speech intelligibility.
    Original languageEnglish
    Pages (from-to)1352-1362
    Number of pages11
    JournalJournal of the Acoustical Society of America
    Volume136
    Issue number3
    DOIs
    Publication statusPublished - 2014

    Keywords

    • English language
    • Japanese speakers
    • speech

    Fingerprint

    Dive into the research topics of 'The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers'. Together they form a unique fingerprint.

    Cite this