This repo is the official PyTorch implementation for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization accepted by CVIU.
To use this LAV-DF dataset, you should agree the terms and conditions.
Download link: OneDrive, Google Drive, HuggingFace.
| Method | AP@0.5 | AP@0.75 | AP@0.95 | AR@100 | AR@50 | AR@20 | AR@10 |
|---|---|---|---|---|---|---|---|
| BA-TFD | 79.15 | 38.57 | 00.24 | 67.03 | 64.18 | 60.89 | 58.51 |
| BA-TFD+ | 96.30 | 84.96 | 04.44 | 81.62 | 80.48 | 79.40 | 78.75 |
Please note this result of BA-TFD is slightly better than the one reported in the paper. This is because we have used the better hyperparameters in this repository.
The main versions are,
- Python >= 3.7, < 3.11
- PyTorch >= 1.13
- torchvision >= 0.14
- pytorch_lightning == 1.7.*
Run the following command to install the required packages.
pip install -r requirements.txtTrain the BA-TFD introduced in paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization with default hyperparameter on LAV-DF dataset.
python train.py \
--config ./config/batfd_default.toml \
--data_root <DATASET_PATH> \
--batch_size 4 --num_workers 8 --gpus 1 --precision 16The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory. If you meet the NaN issue when training BA-TFD+, that might be caused by the bug in PyTorch self attention ops, upgrading or changing the PyTorch version can solve it.
Train the BA-TFD+ introduced in paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization with default hyperparameter on LAV-DF dataset.
python train.py \
--config ./config/batfd_plus_default.toml \
--data_root <DATASET_PATH> \
--batch_size 4 --num_workers 8 --gpus 2 --precision 32Please use FP32 for training BA-TFD+ as FP16 will cause inf and nan.
The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory.
Please run the following command to evaluate the model with the checkpoint saved in ckpt directory.
Besides, you can also download the BA-TFD and BA-TFD+ pretrained models.
python evaluate.py \
--config <CONFIG_PATH> \
--data_root <DATASET_PATH> \
--checkpoint <CHECKPOINT_PATH> \
--batch_size 1 --num_workers 4In the script, there will be a temporal inference results generated in output directory, and the AP and AR scores will
be printed in the console.
Note please make sure only one GPU is visible to the evaluation script.
This project is under the CC BY-NC 4.0 license. See LICENSE for details.
If you find this work useful in your research, please cite them.
The conference paper,
@inproceedings{cai2022you,
title = {Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization},
author = {Cai, Zhixi and Stefanov, Kalin and Dhall, Abhinav and Hayat, Munawar},
booktitle = {2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)},
year = {2022},
doi = {10.1109/DICTA56598.2022.10034605},
pages = {1--10},
address = {Sydney, Australia},
}The extended journal version is accepted by CVIU,
@article{cai2023glitch,
title = {Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization},
author = {Cai, Zhixi and Ghosh, Shreya and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin and Hayat, Munawar},
journal = {Computer Vision and Image Understanding},
year = {2023},
volume = {236},
pages = {103818},
issn = {1077-3142},
doi = {10.1016/j.cviu.2023.103818},
}Some code related to boundary matching mechanism is borrowed from JJBOY/BMN-Boundary-Matching-Network and xxcheng0708/BSNPlusPlus-boundary-sensitive-network.