Motion Fused Frames (MFFs)

Pytorch implementation of Motion Fused Frames, built on top of the codebase TSN-pytorch.

This is the PyTorch code for the following paper:

Okan Köpüklü, Neslihan Köse, and Gerhard Rigoll
"Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition",
Proceedings of the CVPR Workshop on Analysis and Modeling of Faces and Gestures.

Requirements

PyTorch 0.3.1 (version 0.4.0 raises dimension error at loading pretrained models from the model zoo)
OpenCV compiled with CUDA and FFmpeg, for optical flow calculation and data augmentations.
Python 3.6

Note: always use git clone --recursive https://github.com/okankop/MFF-pytorch to clone this project Otherwise you will not be able to use the inception series CNN architecture.

Dataset Preparation

Download the jester dataset or NVIDIA dynamic hand gestures dataset or ChaLearn LAP IsoGD dataset. Decompress them into the same folder and use process_dataset.py to generate the index files for train, val, and test split. Poperly set up the train, validatin, and category meta files in datasets_video.py. Finally, use directory flow_computation to calculate the optical flow images using Brox method.

Assume the structure of data directories is the following:

~/MFF-pytorch/
   datasets/
      jester/
         rgb/
            .../ (directories of video samples)
                .../ (jpg color frames)
         flow/
            u/
               .../ (directories of video samples)
                  .../ (jpg optical-flow-u frames)
            v/
               .../ (directories of video samples)
                  .../ (jpg optical-flow-v frames)
    model/
       .../(saved models for the last checkpoint and best model)

Running the Code

Followings are some examples for training under different scenarios:

Train 4-segment network with 3 flow, 1 color frames (4-MFFs-3f1c architecture)

python main.py jester RGBFlow --arch BNInception --num_segments 4 \
--consensus_type MLP --num_motion 3  --batch-size 32

Train resuming the last checkpoint (4-MFFs-3f1c architecture)

python main.py jester RGBFlow --resume=<path-to-last-checkpoint> --arch BNInception \
--consensus_type MLP --num_segments 4 --num_motion 3  --batch-size 32

The command to test trained models (4-MFFs-3f1c architecture). Pretrained models are under pretrained_models.

python test_models.py jester RGBFlow pretrained_models/MFF_jester_RGBFlow_BNInception_segment4_3f1c_best.pth.tar --arch BNInception --consensus_type MLP --test_crops 1 --num_motion 3 --test_segments 4

All GPUs are used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=...

Citation

If you use this code or pre-trained models, please cite the following:

@InProceedings{Kopuklu_2018_CVPR_Workshops,
author = {Kopuklu, Okan and Kose, Neslihan and Rigoll, Gerhard},
title = {Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}
}

Acknowledgement

We thank Yuanjun Xiong for releasing TSN-Pytorch codebase, which we build our work on top. We also thank Bolei Zhou for the insprational work Temporal Segment Networks, from which we imported process_dataset.py to our project.

Zhao-hangtian / MFF-pytorch