Voices Obscured in Complex Environmental Settings

Getting Started With VOiCES

Downloading the from S3

VOiCES is now publicly available on AWS. Check out the AWS Registry of Open Data for details. Data structure and microphone specifics are available in our README file.

The easiest way to access the data is through the AWS Command Line Interface (CLI). Follow that link to setup and configure the AWS CLI. The VOiCES data is stored on the lab41openaudiocorpus S3 bucket.

To list the content of the s3 bucket associated with VOiCES, run

aws s3 ls s3://lab41openaudiocorpus

There will be three files present:

VOiCES_release.tar.gz (417.5 GiB)
VOiCES_devkit.tar.gz (27.5 GiB)
VOiCES_competition.tar.gz (19.5 GiB)
recording_data.tar.gz (56 MiB)

Download data using aws s3 sync <source> <target> [--options] or aws s3 cp <source> <target> [--option]. For example, to download the devkit to current directory run the following:

aws s3 cp s3://lab41openaudiocorpus/VOiCES_devkit.tar.gz .

All files are compressed archives and can be decompressed using gzip.