This repo serves as an expansion over youtube-8m dataset in terms of metadata, the project aims to scrap more metadata about the videos itself rather than data, such as views count, comments, etc.
The program takes a directory of your tf records that you have downloaded from youtube-8m dataset from here, and decodes them to get video urls to scrap them for more data
you need to have the tf records (video level) downloaed from youtube-8m dataset
pip install -r requirements.txt
flake8
python main.py y8m_tf_records_data_directory commit_every_x_videos
Example
python main.py ./data 20