Nov 20, 2016
Taipai, Taiwan
There is clear evidence that visual cues play an important role in automatic speech recognition either when audio is seriously corrupted by noise, through audiovisual speech recognition (AVSR), or even when it is inaccessible, through automatic lip-reading (ALR).
This workshop is aimed to challenge researchers to deal with the large variation caused by camera-view changes in the context of ALR/AVSR. To this end, we have collected a multi-view audiovisual database, named 'OuluVS2', which includes 52 speakers uttering both discrete and continuous utterances, simultaneously recorded by 5 cameras from 5 different viewpoints. To facilitate participants, we have pre-processed most of the data to extract the regions of interest, that is, a rectangular area including the talking mouth.
User Name : sidra
Posted 22-07-2016 on 12:07:30 AEDT