Abstract
Visual cues to speech prosody are available from a speaker's face; however the form and location of such cues are likely to be inconsistent across speakers. Given this, the question arises whether such cues have enough in common to signal the same prosodic information across face areas and different speakers. To investigate this, the present study used visual-visual matching tasks requiring participants to view pairs of silent videos (with one video displaying the upper half, the other video showing the lower half of the face), and select the pair produced with the same prosody (different recorded tokens were used). Participants completed both a same-speaker (both upper and lower videos from the same speaker) and cross-speaker version (upper and lower videos originated from different speakers) of the task. Compared to same-speaker matching, performance was lower for cross-speaker matching but still much greater than chance (i.e., 50%). These results support the idea that visual correlates of prosody are encoded by perceivers as abstract, non-speaker specific cues that are transferable across repetitions, speakers and face areas.
Original language | English |
---|---|
Title of host publication | Proceedings of International Conference on Auditory-Visual Speech Processing (AVSP 2010), Hakone, Kanagawa, Japan, 30 Sep. - 3 Oct. 2010 |
Publisher | AVSP |
Pages | 38-43 |
Number of pages | 6 |
Publication status | Published - 2010 |
Event | International Conference on Auditory-Visual Speech Processing - Duration: 29 Aug 2013 → … |
Conference
Conference | International Conference on Auditory-Visual Speech Processing |
---|---|
Period | 29/08/13 → … |
Keywords
- visual perception
- facial expression