VoxBlink / ScriptsForVoxBlink

A repo containing download guidance and corresponding scripts of the VoxBlink dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The VoxBlink Dataset

The VoxBlink dataset is a Large Scale speaker verification dataset obtained from YouTube platform. This repository provides guidelines for downloading and accessing the dataset, along with necessary scripts. For more introduction, please see cite.

Resource

Let's start with obtaining the resource files and decompressing tar-files.

tar -zxvf timestamp.tar.gz
tar -zxvf video_tags.tar.gz 

File structure

% The file structure is summarized as follows: 
|---- resource                  # resource folder
|     |---- data               # [utt] spkid-videoid-uttid
|           |---- utt_clean	
|           |---- utt_full	
|     |---- meta             # meta-data:
|           |---- spk2gender	# [spkid,gender] speaker gender labels
|           |---- spk2lan	# [spkid,language] speaker language labels
|           |---- spk2loc	# [spkid,location] speaker location labels
|           |---- utt2dur	# [utt]
|     |---- timestamp		# timestamps for video/audio cropping
|           |---- id00000	# spkid
|                 |---- DwgYRqnQZHM	#videoid
|                       |---- 00000.txt	#uttid
|                       |---- ...
|                 |---- ... 
|           |---- ...	
|     |---- video_list [spk videoid1 videoid2 ...] list for downloading videos
|           |---- spk2videos_clean	# voxblink-clean video list (18k+ speakers)
|           |---- spk2videos_full	# voxblink video list (38k+ speakers)
|           |---- spk2videos_test	# video list for testing scripts
|     |---- video_tags
|           |---- id00000.txt
|           |---- id00001.txt
|           |---- ...


|---- cropper.py	# extract speech/video segments by timestamps from downloaded videos
|---- downloader.py	# download videos by video_list
|---- LICENSE		# license
|---- README.md	
|---- requirement.txt			

Download

The following procedures show how to construct your VoxBlink

Pre-requisites

  • Install ffmpeg:
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg
  • Clone Repo:
git clone https://github.com/VoxBlink/ScriptsForVoxBlink.git
  • Install Python library:
python3 -m pip install -r requirements.txt
  • Download videos

    We provide three modes for you to download videos. We Also leverage multi-thread to facilate download process

    • full: Download VoxBlink complete version.

    • clean: Download VoxBlink-clean version.

    • test: Test whether the scripts are runnable.

python downloader.py --base_dir $BASE_DIR$ --num_workers 4 --mode full
  • Crop Videos
python cropper.py --save_dir data/ --timestamp_dir resource/timestamp --num_workers 4 --mode test --video_dir $BASE_DIR$

License

The dataset is licensed under the CC BY-NC-SA 4.0 license. This means that you can share and adapt the dataset for non-commercial purposes as long as you provide appropriate attribution and distribute your contributions under the same license. Detailed terms can be found here.

Important Note: The metadata provided is accurate as of June 2023. We cannot guarantee the availability of videos on the YouTube platform in the future. We recommend downloading the dataset promptly. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: yuke.lin@dukekunshan.edu.cn or ming.li369@dukekunshan.edu.cn.

Citation

Please cite the paper below if you make use of the dataset:

@misc{lin2023voxblink,
      title={VoxBlink: X-Large Speaker Verification Dataset on Camera}, 
      author={Yuke Lin and Xiaoyi Qin and Ming Cheng and Ning Jiang and Guoqing Zhao and Ming Li},
      year={2023},
      eprint={2308.07056},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

About

A repo containing download guidance and corresponding scripts of the VoxBlink dataset.

License:Other


Languages

Language:Python 100.0%