Predictive Corrective Networks for Action Detection

This is the source code for training and evaluating "Predictive-Corrective Networks".

Please file an issue if you run into any problems, or contact me.

Download models and data

To download models+data, run

bash download.sh

This will create a directory with the following structure

data/
    <dataset>/
        models/
            vgg16-init.t7: Initial VGG-16 model pre-trained on Imagenet.
            vgg16-trained.t7: Trained VGG-16 single-frame model.
            pc_c33-1_fc7-8.t7: Trained predictive-corrective model.
        labels/
            trainval.h5: Train labels
            test.h5: Test labels

Currently only models for THUMOS/MultiTHUMOS are included, but we will release Charades models as soon as possible.

Dumping frames

Before running on any videos, you will need to dump frames (resized to 256x256) into a root directory which contains one subdirectory for each video. Each video subdirectory should contain frames of the form frame%04d.png (e.g. frame0012.png), extracted at 10 frames per second. If you would like to train or evaluate models at different frame rates, please file an issue or contact me and I can point you in the right direction.

You may find my dump_frames and resize_images scripts useful for this.

Running a pre-trained model

Store frames from your videos in one directory frames_root, with frames at frames_root/video_name/frame%04d.png as described above.

To evaluate the predictive-corrective model, run

th scripts/evaluate_model.lua \
    --model data/multithumos/models/pc_c33-1_fc7-8.t7 \
    --frames /path/to/frames_root \
    --output_log /path/to/output.log \
    --sequence_length 8 \
    --step_size 1 \
    --batch_size 16 \
    --output_hdf5 /path/to/output_predictions.h5

Training a model

Single-frame

To train a single frame model, look at config/config-vgg.yaml. Documentation for each config parameter is available in main.lua, but the only ones you really need to change are the path to training and test frames.

train_source_options:
    frames_root: '/path/to/multithumos/test/frames'
    labels_hdf5: 'data/multithumos/labels/test.h5'

val_source_options:
    frames_root: '/path/to/multithumos/trainval/frames'
    labels_hdf5: 'data/multithumos/labels/trainval.h5'

Once you have updated these, run

th main.lua config/config-vgg.yaml /path/to/output/directory

Predictive Corrective

First, generate a predictive-corrective model initialized from a trained single-frame model, as follows:

th scripts/make_predictive_corrective.lua \
--model data/multithumos/models/vgg16-trained.t7 \
--output data/multithumos/models/pc_c33-1_fc7-8-init.t7

Next, update config/config-predictive-corrective.yaml to point to your dumped frames, as described above. Then, run

th main.lua config/config-predictive-corrective.yaml /path/to/output/directory

This usually takes 2-3 days to run on 4 GPUs.

Required packages

Note: This is an incomplete list! TODO(achald): Document all required packages.

argparse
classic
cudnn
cutorch
luaposix
lyaml
nnlr
rnn

Caution: Other configs/scripts

Please note that there are a number of other scripts and configs in this repository that are not well documented. I am sharing them in case any of them are useful to look at, to see how I use the model, etc., but beware that they may be broken and I may not be able to help you fix them.

Extra: Generate labels hdf5 files

For convenience, we provide the labels for the datasets we use as HDF5 files. However, it is possible to generate these yourself. Here is the script I used to generate MultiTHUMOS labels HDF5, and here is a similar script for Charades. These are not very well documented, but feel free to contact me if you run into any issues.

achalddave / predictive-corrective