Violence detection from Videos using Dual stream CNN

This project is to implement an action detection binary classifier to detect violence by extracting spatial and temporal data. The classifier finds applications in public surveillance and social media screening. The feature extraction method and model architecture implemented can be extended to any kind of action recognition from videos. The project was done using PyTorch.

The dataset we used consists of 1000 short clips from each class: Violence and Nonviolence. Each video ranges from a duration of 1 to 6 seconds. The videos from the violence class includes violent interactions between people in different setting like street fights and physical altercations during sports games. The non violence class has videos of people in taking part in various activities that are non violent.

Here are some examples from the violence class.

And these are some examples from the non-violence class.

Data Preprocessing Pipeline

Since the dataset consists of raw videos, some processing was done to make it useful for training:

10 frames are extracted from the middle second in the video
All frames are cropped to a square and resized to a fixed resolution
The optical flow data is extracted from consecutive images
The first frame and the optical flow images are stacked to form the input to the model

Model

The CNN has the dual stream architecture described in this journal paper titled 'Two-Stream Convolutional Networks for Action Recognition in Videos' where the spatial stream takes the first frame and the termporal stream takes the optical flow data as the input.

moxiaoguai1993 / violence_detection_cnn

Violence detection from Videos using Dual stream CNN

Data Preprocessing Pipeline

Model

About

Languages