Dataset Description
The Voices Obscured in Complex Environmental settings (VOiCES) corpus presents audio recorded in acoustically challenging conditions. Recordings took place in real rooms of various sizes, capturing different background and reverberation profiles for each room. Various types of distractor noise (TV, music, or babble) were simultaneously played with clean speech. Audio was recorded at a distance using twelve microphones strategically placed throughout the room. To imitate human behavior during conversation, the foreground speaker used a motorized platform, rotating over a range of angles during recordings.
Three hundred distinct speakers from LibriSpeech’s “clean” data subset were selected as the source audio, ensuring a 50-50 female-male split. In preparation for upcoming data challenges, the first release of the VOiCES corpus will include 200 speakers only. The remaining 100 speakers will be reserved for model validation; the full corpus (300 speakers) will be released once the data challenge is closed.
In addition to the full dataset, we also provide a dev set and a mini-dev set. Both maintain the data structure of the VOiCES corpus, but include a small subset of data. The dev set includes audio files for four randomly selected speakers (50-50 female-male split) for data recorded in Room-1. This includes data from all twelve microphones. The mini-dev set includes one speaker, one room (Room-1), and studio microphones only.
Readme
For more details about how to use the dataset, please see our README.
Blog Posts
- Introducing the Voices Obscured in Complex Environmental Settings (VOiCES) corpus
- Realistic speech data applications in Machine Learning
- VOiCES: closing one chapter and starting a second
Publications
- Robust Speaker Recognition from Distant Speech Under Real Reverberant Environments
- Corpus Description and Collection Methodology
- 175th Meeting of the Acoustical Society of America
Interspeech 2019: VOiCES from a Distance Challenge
- The VOiCES from a Distance Challenge 2019 Evaluation Plan
- The VOiCES from a Distance Challenge 2019
- STC Speaker Recognition Systems for the VOiCES From a Distance Challenge
- Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
- The STC ASR System for the VOiCES from a Distance Challenge 2019
- The I2R’s ASR System for the VOiCES from a Distance Challenge 2019
- Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
- The JHU Speaker Recognition System for the VOiCES 2019 Challenge
- Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019
- The I2R’s Submission to VOiCES Distance Speaker Recognition Challenge 2019
- The LeVoice Far-Field Speech Recognition System for VOiCES from a Distance Challenge 2019
- The JHU ASR System for VOiCES from a Distance Challenge 2019
- The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge