Making Reconstruction-based Method Great Again for Video Anomaly Detection（ICDM 2022）

arXiv | Paper | Primary contact: Yizhou Wang

Abstract

Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. Nevertheless, existing reconstruction-based methods 1) rely on old-fashioned convolutional autoencoders and are poor at modeling temporal dependency; 2) are prone to overfit the training samples, leading to indistinguishable reconstruction errors of normal and abnormal frames during the inference phase. To address such issues, firstly, we get inspiration from transformer and propose Spatio-Temporal Auto-Trans-Encoder, dubbed as STATE, as a new autoencoder model for enhanced consecutive frame reconstruction. Our STATE is equipped with a specifically designed learnable convolutional attention module for efficient temporal learning and reasoning. Secondly, we put forward a novel reconstruction-based input perturbation technique during testing to further differentiate anomalous frames. With the same perturbation magnitude, the testing reconstruction error of the normal frames lowers more than that of the abnormal frames, which contributes to mitigating the overfitting problem of reconstruction. Owing to the high relevance of the frame abnormality and the objects in the frame, we conduct object-level reconstruction using both the raw frame and the corresponding optical flow patches. Finally, the anomaly score is designed based on the combination of the raw and motion reconstruction errors using perturbed inputs. Extensive experiments on benchmark video anomaly detection datasets demonstrate that our approach outperforms previous reconstruction-based methods by a notable margin, and achieves state-of-the-art anomaly detection performance consistently.

Usage

Prepare data

Follow the intructions in code of VEC to download and organize Avenue and ShanghaiTech datasets.

Environment setup

python 3.6
PyTorch 1.1.0 (0.3.0 for calculating optical flow)
torchvision 0.3.0
cuda 9.0.176
cudnn 7.0.5
mmcv 0.2.14 (might use pip install mmcv==0.2.14 to install old version)
mmdetection 1.0rc0 (might use git clone -b v1.0rc0 https://github.com/open-mmlab/mmdetection.git to clone old version)
numpy 1.17.2
scikit-learn 0.21.3

For the main training and testing process, the conda environment vad.yaml is provided.

conda env create -f vad.yaml

Run the experiments

1. Calculate optical flow

(1) Follow the instructions to install FlowNet2, then download the pretrained model flownet2, and move the downloaded model FlowNet2_checkpoint.pth.tar into ./FlowNet2_src/pretrained (create a folder named pretrained).

(2) Run calc_img_inputs.py (in PyTorch 0.3.0): python calc_img_inputs.py. This will generate a new folder named optical_flow containing the optical flow of the different datasets. The optical_flow folder has basically the same directory structure as the raw_datasets folder.

2. Get pretrained object detector to generate bounding boxes

Follow the instructions to install mmdet (might use git clone -b v1.0rc0 https://github.com/open-mmlab/mmdetection.git to clone old version of mmdetection). Then download the pretrained object detector Cascade R-CNN, and move it to fore_det/obj_det_checkpoints (create a folder named obj_det_checkpoints).

3. Reproduce the results

The model checkpoints and some saved training statistcs to use are provided in this Dropbox Link

Avnue

To train , run the following command

python train.py -d avenue -l 3 -n_l 3 -e 20

To test, run the following command

python test.py -d avenue -l 3 -n_l 3 -e 20 -w_r 0.3 -w_o 1 -ep 0.002

ShanghaiTech