ajabri / videowalk

Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)

Home Page:http://ajabri.github.io/videowalk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Efficient way to download Kinetics-400

vadimkantorov opened this issue · comments

@ajabri Would downloading it from AcademicTorrents have the good size/directory structure?

Or did you download it using https://github.com/Showmax/kinetics-downloader? (recommended at https://github.com/pytorch/vision/tree/master/references/video_classification#data-preparation; which runs youtube-dl and then converts all them to mp4 (and I guess, h264). I tried it and in 2 hours it just downloaded ~500Mb out of 400Gb.

Do you know if clips must converted to mp4? Or would VideoClips just use ffmpeg once for sampling frames? (in that case recoding to the same format is not needed)

Did you use some other way?

What is expected Kinetics400 dataset directory structure? (not explained at https://pytorch.org/docs/stable/torchvision/datasets.html#kinetics-400 or in the dataset metadata). Is it /path/to/dataset/<split>/<classlabel>/<youtubeid>.avi?

If yes, then what is the origin of train_256? From what I understoo the only splits are train, val and test

Thanks a lot!

Could you please publish the effective file lists for Kinetics400 version that you used? I downloaded the dataset from https://academictorrents.com/details/184d11318372f70018cf9a72ef867e2fb9ce1d26, but it contains .mkv and .webm files. I can add them to extensions arg, but this is to make sure that the file lists roughly match. Thanks!

train_256 from that torrent comes off as 273Gb when I just add the number of bytes of files in the zipball, not 400Gb as mentioned at https://github.com/pytorch/vision/tree/master/references/video_classification#data-preparation

same questions.
is train_256 in the original dataset or do I have to rename the folder to it?
I use youtube-dl but find no train_256 in the path

The torrent above has train_256 and val_256, but it'll be better to have the file lists indeed

thanks @vadimkantorov
look forward to author's clarifications

hi @vadimkantorov
any other ways to download the dataset? the torrent you share is too slow and gets stuck.

The torrent took me 3-4 days, https://github.com/Showmax/kinetics-downloader or manual youtube-dl would likely tak ~2-3 days as wall, but in addition it may lead to IP bans by Google

@ajabri If the torrent was used, because of configuration of dataset just with mp4, the mkv and webm files probably were skipped...

Hi @ajabri, i'm a PhD student that is starting to look at your work and code, but i'm having some problems downloading data. The torrent @vadimkantorov mentioned is out for the moment, so i went for https://github.com/Showmax/kinetics-downloader downloading just few classes.
My problem is to understand what is the folder "train_256", because if i just rename the folder in that way, the model seems to don't resize correctly the videos (look at the image error) the dataloader tries to stack 720x1280 image with a 480x640 one.

screen

Could you explain me how to correctly setup the data? Many thanks

Hi @FraLuca,

Thanks for the interest. The issue is that different example in your batch have different resolution. In my training code, I assume the videos have resolution 256 x 256. So in your case, you can prepend a transform which first resizes the the frames to the same size.

Hi @icoz69 and @vadimkantorov,

Sorry for missing your messages above. Hopefully this is no longer a block, but indeed train_256 is a folder containing a resized (256x256) version of Kinetics. That said, I think the results should be robust to changes in the training data distribution caused by resizing and preprocessing, or occasional missing videos.