Abstracting visual prosody across speakers and face areas

Research output: Chapter in Book / Conference PaperConference Paper

Abstract

Visual cues to speech prosody are available from a speaker's face; however the form and location of such cues are likely to be inconsistent across speakers. Given this, the question arises whether such cues have enough in common to signal the same prosodic information across face areas and different speakers. To investigate this, the present study used visual-visual matching tasks requiring participants to view pairs of silent videos (with one video displaying the upper half, the other video showing the lower half of the face), and select the pair produced with the same prosody (different recorded tokens were used). Participants completed both a same-speaker (both upper and lower videos from the same speaker) and cross-speaker version (upper and lower videos originated from different speakers) of the task. Compared to same-speaker matching, performance was lower for cross-speaker matching but still much greater than chance (i.e., 50%). These results support the idea that visual correlates of prosody are encoded by perceivers as abstract, non-speaker specific cues that are transferable across repetitions, speakers and face areas.
Original languageEnglish
Title of host publicationProceedings of International Conference on Auditory-Visual Speech Processing (AVSP 2010), Hakone, Kanagawa, Japan, 30 Sep. - 3 Oct. 2010
PublisherAVSP
Pages38-43
Number of pages6
Publication statusPublished - 2010
EventInternational Conference on Auditory-Visual Speech Processing -
Duration: 29 Aug 2013 → …

Conference

ConferenceInternational Conference on Auditory-Visual Speech Processing
Period29/08/13 → …

Keywords

  • visual perception
  • facial expression

Cite this