talhasaruhan / video-action-classification

Video Action Classification Using Spatial Temporal Clues. Original paper: arXiv:1504.01561

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can this code run on the CCV dataset?

Ukangkang opened this issue · comments

I means Columbia Consumer Videos (CCV) Dateset.

Well, I'm not particularly familiar with the CCV dataset, but if you're willing to make a couple of small changes it should work fine.

In the code, get_data() function returns a tuple of spatial_frames, stacked_motion_frames, label, first two are irrelevant here and the label is simply read from the trainlist01.txt where each line corresponds to a filename and its label as an integer.

You can either externally create a tool that maps your video & label pairs to 'filename , integer' encoded label format found in trainlist01.txt, and just replace it. OR you can modify the python code itself to read labels differently, if you're somewhat familiar with python it should be a painless experience. But if you ever have any problems feel free to ask here.

PS: Don't forget to run your videos through the optical flow script. It pre-calculates the optical flows for each video file in dataset and saves them efficiently on the disk. You need to change vdir to the directory where your video files reside.

If your actual video files are in a hierarchical folder structure, you can either flatten them or change the file traversing bits in the scripts, it should be a trivial change to make.

When I run 'model.py',Something went wrong.
Program prompt:‘Resource exhausted: OOM when allocating tensor with shape[800,224,224,64]’ and 'OOM when allocating tensor with shape[1589,112,112,96]'
This means out of memory,But I have two GPUs, each with 16G memory. I think it is because the number of video frames is too large, resulting in too much batch_size. But I don't know how to solve this problem.

Well, there are a couple of restrictions in regards to memory usage. First, due to the nature of python, you'll have copies of the same object even if it's for a small time window, and tensorflow will also create copies when its copying data from python to its C backend.

I'm not sure if this is a purely GPU memory problem, if it is, then I'm afraid I may not be able to help you off the top of my head. I'm not sure how tensorflow actually allocates using multiple GPUs and even if it does seperately allocate data, I don't know if it does so automatically. But this blog post seems a good place to start.

Also a quick calculation shows that an array of that size will take about 15 GB when using 32 bit floats. I think you can try 16 bit floats, that may solve your problem. There are solid mathematical background on this, showing using 16 bit floats doesn't bring any more variance to your model.

Edit: First, I suggest you to check your system memory and make sure whether this is an GPU or system memory limitation.

Also, if you want a quick fix, you can try reducing L (height of the stack) or simply separate the input into multiple parts, (for example, split videos more than 10 seconds into 2 parts and make them have the same label etc.)