Abstracting visual prosody across speakers and face areas

Erin Cvejic, Jeesun Kim, Chris Davis

    Research output: Chapter in Book / Conference PaperConference Paper

    Abstract

    Visual cues to speech prosody are available from a speaker's face; however the form and location of such cues are likely to be inconsistent across speakers. Given this, the question arises whether such cues have enough in common to signal the same prosodic information across face areas and different speakers. To investigate this, the present study used visual-visual matching tasks requiring participants to view pairs of silent videos (with one video displaying the upper half, the other video showing the lower half of the face), and select the pair produced with the same prosody (different recorded tokens were used). Participants completed both a same-speaker (both upper and lower videos from the same speaker) and cross-speaker version (upper and lower videos originated from different speakers) of the task. Compared to same-speaker matching, performance was lower for cross-speaker matching but still much greater than chance (i.e., 50%). These results support the idea that visual correlates of prosody are encoded by perceivers as abstract, non-speaker specific cues that are transferable across repetitions, speakers and face areas.
    Original languageEnglish
    Title of host publicationProceedings of International Conference on Auditory-Visual Speech Processing (AVSP 2010), Hakone, Kanagawa, Japan, 30 Sep. - 3 Oct. 2010
    PublisherAVSP
    Pages38-43
    Number of pages6
    Publication statusPublished - 2010
    EventInternational Conference on Auditory-Visual Speech Processing -
    Duration: 29 Aug 2013 → …

    Conference

    ConferenceInternational Conference on Auditory-Visual Speech Processing
    Period29/08/13 → …

    Keywords

    • visual perception
    • facial expression

    Cite this