VadCLIP

This is the official Pytorch implementation of our paper: "VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection" in AAAI 2024.

Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

Highlight

We present a novel diagram, i.e., VadCLIP, which involves dual branch to detect video anomaly in visual classification and language-visual alignment manners, respectively. With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained WSVAD. To our knowledge, VadCLIP is the first work to efficiently transfer pre-trained language-visual knowledge to WSVAD.
We propose three non-vital components to address new challenges led by the new diagram. LGT-Adapter is used to capture temporal dependencies from different perspectives; Two prompt mechanisms are devised to effectively adapt the frozen pre-trained model to WSVAD task; MIL-Align realizes the optimization of alignment paradigm under weak supervision, so as to preserve the pre-trained knowledge as much as possible.
We show that strength and effectiveness of VadCLIP on two large-scale popular benchmarks, and VadCLIP achieves state-of-the-art performance, e.g., it gets unprecedented results of 84.51% AP and 88.02% on XD-Violence and UCF-Crime respectively, surpassing current classification based methods by a large margin.

Training

Setup

We extract CLIP features for UCF-Crime and XD-Violence datasets, and release these features and pretrained models as follows:

Benchmark	CLIP[Baidu]	CLIP	Model[Baidu]	Model
UCF-Crime	Code: 7yzp	OneDrive	Code: kq5u	OneDrive
XD-Violence	Code: v8tw	OneDrive	Code: apw6	OneDrive

The following files need to be adapted in order to run the code on your own machine:

Change the file paths to the download datasets above in list/xd_CLIP_rgb.csv and list/xd_CLIP_rgbtest.csv.
Feel free to change the hyperparameters in xd_option.py

Train and Test

After the setup, simply run the following command:

Traing and infer for XD-Violence dataset

python xd_train.py
python xd_test.py

Traing and infer for UCF-Crime dataset

python ucf_train.py
python ucf_test.py

References

We referenced the repos below for the code.

Citation

If you find this repo useful for your research, please consider citing our paper:

@article{wu2023vadclip,
  title={Vadclip: Adapting vision-language models for weakly supervised video anomaly detection},
  author={Wu, Peng and Zhou, Xuerong and Pang, Guansong and Zhou, Lingru and Yan, Qingsen and Wang, Peng and Zhang, Yanning},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
  year={2024}
}

@article{wu2023open,
  title={Open-Vocabulary Video Anomaly Detection},
  author={Wu, Peng and Zhou, Xuerong and Pang, Guansong and Sun, Yujia and Liu, Jing and Wang, Peng and Zhang, Yanning},
  journal={arXiv preprint arXiv:2311.07042},
  year={2023}
}

nwpu-zxr / VadCLIP