Description

We are hosting a special session at Speaker Odyssey 2020 – JOIN VOiCES 2020: Advances in Far-Field Speaker Recognition HERE!

SRI International and Lab41, In-Q-Tel, are proud to release the Voices Obscured in Complex Environmental Settings (VOICES) corpus, a collaborative effort that brings speech data in acoustically challenging reverberant environments to the researcher. Clean speech was recorded in rooms of different sizes, each having distinct room acoustic profiles, with background noise played concurrently. These recordings provides audio data that better represent real-use scenarios. The intended purpose of this corpus is to promote acoustic research including, but not limited to:

Speaker Identification, speech recognition, speaker detection
Event and background classification, speech/non-speech
Source separation and localization, noise reduction, general enhancement, acoustic quality metrics

The corpus contains the source audio, the retransmitted audio, orthographic transcriptions, and speaker labels. The ultimate goal of this corpus is to advance acoustic research by providing access to complex acoustic data. The corpus will be released as open source, Creative Commons BY 4.0, free for commercial, academic, and government use.

Dataset Details

This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details:

Source Material: a total of 15 hours (3,903 audio files)
Language audio contains English read speech with male and females
Simulated Head Movement the loudspeaker playing the foreground speech was on a motorized rotating platform
Distractor Noise a large collection containing television, music, babble noise, and HVAC at various SNR
Multiple Rooms large, medium, and small, with various reverberation

More specific details can be seen at in our readme and paper in the reading section

Citing VOiCES

VOiCES is publicly available released under Creative Commos BY 4.0, free for commercial, academic, and government use. If you use VOiCES, we would appreciate reference to the following in publications VOiCES.

Biblatex entry:

@misc{richey2018voices,
    title={Voices Obscured in Complex Environmental Settings (VOICES) corpus},
    author={Colleen Richey and Maria A. Barrios and Zeb Armstrong and Chris Bartels and Horacio Franco and Martin Graciarena and Aaron Lawson and Mahesh Kumar Nandwana and Allen Stauffer and Julien van Hout and Paul Gamble and Jeff Hetherly and Cory Stephenson and Karl Ni},
    year={2018},
    eprint={1804.05053},
    archivePrefix={arXiv},
    primaryClass={cs.SD}
}