sudonto / ActionRecognition

Explore Action Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Action Recognition

Overview

  • Explore some action recognition models

  • Dataset: UCF-101

  • Compare the performance of different models and do some analysis based on the experiment results

File Structure of the Repo

rnn_practice: For doing some practice on RNN models and LSTMs with online tutorials and other useful resources

data: Training and testing data. (But don't add huge data files to this repo, add them to gitignore)

models: Defining the architecture of models

utils: Utils scripts for dataset preparation, input pre-processing and other misc

train_CNN: For training the different CNN models. Load corresponding model, set the training parameters and then start training

process_CNN: For the LRCN model, the CNN component is pre-trained and then fixed during the training of LSTM cells. Thus we can use the CNN model to pre-process the frames of each video and store the intermediate result for feeding into LSTMs later. This can largely improve the training efficiency of the LRCN model

train_RNN: For training the LRCN model

predict: For calculating the overall testing accuracy on the whole testing set

Models Description

  • Fine-tuned ResNet50 trained solely with single-frame image data (every frame of every video is considered as an image for training or testing, thus a natural data augmentation). The ResNet50 is from keras repo, with weights pre-trained on Imagenet. ./models/finetuned_resnet.py

  • LRCN (CNN feature extractor(here we use the fine-tuned ResNet50) + LSTMs). Input is a sequence of frames uniformly extracted from each video. The fine-tuned ResNet directly uses the result of 1 without extra training (C.F.Long-term recurrent convolutional network).

    Produce intermediate data using ./process_CNN.py and then train and predict with ./models/RNN.py

  • Simple CNN model trained with stacked optical flow data (generate one stacked optical flow from each of the video, and use the optical flow as the input of the network). ./models/temporal_CNN.py

  • Two-stream model, combine the models in 2 and 3 with an extra fusion layer that output the final result. 3 and 4 refer to this paper ./models/two_stream.py

About

Explore Action Recognition

License:MIT License


Languages

Language:Python 100.0%