VOiCES Appearing At Interspeech 2018

General · 01 Sep 2018

Lab41 and SRI International will be attending Interspeech 2018, to be hosted in Hyderabad, India September 2-6. We will be presenting details of the corpus as well as outlining results from model baselines when using VOiCES. Come visit our poster. Conference publication available here.

Abstract: Recent advances in speech and signal processing research leverage deep learning frameworks that can model data complexity better than more traditional approaches. Publicly available audio corpora that are large enough for deep learning implementations are mostly composed of isolated speech at close range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts model performance significantly degrades when tested against un-curated data collected in the wild. This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research under noisy room conditions for speakers at a distance. Data was recorded in furnished rooms with distractor noise played in conjunction with isolated speech. Multiple sessions were recorded in each room to accommodate for all foreground speech-distractor noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of recordings per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state of the art distant microphone approaches in signal processing and speech recognition.

Interspeech 2018 was held September 2-6, 2018 in Hyderabad, India.