liangzimei / five-video-classification-methods

Code that accompanies my blog post outlining five video classification methods in Keras and TensorFlow

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Five video classification methods

The five video classification methods:

  1. Classify one frame at a time with a ConvNet
  2. Extract features from each frame with a ConvNet, passing the sequence to an RNN, in a separate network
  3. Use a time-dstirbuted ConvNet, passing the features to an RNN, much like #2 but all in one network
  4. Extract features from each frame with a ConvNet and pass the sequence to an MLP
  5. Use a 3D convolutional network

See the accompanying blog post for full details:


This code requires you have Keras 2 and TensorFlow 1 or greater installed. Please see the requirements.txt file. To ensure you're up to date, run:

pip install -r requirements.txt

Getting the data

First, download the dataset from UCF into the data folder:

cd data && wget

Then extract it with unrar e UCF101.rar.

Next, create folders (still in the data folder) with mkdir train && mkdir test && mkdir sequences && mkdir checkpoints.

Now you can run the scripts in the data folder to move the videos to the appropriate place, extract their frames and make the CSV file the rest of the code references. You need to run these in order. Example:



Extracting features

Before you can run Methods #4 and #5, you need to extract features from the images with the CNN. This is done by running On my Dell with a GeFore 960m GPU, this takes about 8 hours. If you want to limit to just the first N classes, you can set that option in the file.

Running models

The CNN-only method (method #1 in the blog post) is run from

The rest of the models are run from There are configuration options you can set in that file to choose which model you want to run.

The models are all defined in Reference that file to see which models you are able to run in

UCF101 Citation

Khurram Soomro, Amir Roshan Zamir and Mubarak Shah, UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild., CRCV-TR-12-01, November, 2012.

ezoic increase your site revenue


Code that accompanies my blog post outlining five video classification methods in Keras and TensorFlow

License:MIT License


Language:Python 100.0%