frostinassiky / denoiseloc

The official implementation of DenoiseLoc: Boundary Denoising for Video Activity Localization, ICLR 2024

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DenoiseLoc

DenoiseLoc: Boundary Denoising for Video Activity Localization, ICLR 2024

Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem

This repo hosts the original code of our DenoiseLoc work, along with a copy of the QVHighlights dataset for moment retrieval and highlight detections. DenoiseLoc is an encoder-decoder model to tackle the video activity localization problem from a denoising perspective. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. The inference reverses this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. This code works for the QV-Highlights dataset, where we observe a gain of +12.36% average mAP over the baseline.

The code is developed on top of Moment-DETR. We keep minimal changes for simplicity but make necessary adaptions for clarity. We keep the official data and evaluation tools in folders data and standalone_eval, respectively.

Table of Contents

Prerequisites

  1. Clone this repo
git clone https://github.com/frostinassiky/denoise_loc.git
cd moment_detr
  1. Prepare feature files

Download feature file (8GB) moment_detr_features.tar.gz. The instruction can be found from Moment-DETR. Then extract it under the project root directory:

tar -xf path/to/moment_detr_features.tar.gz

The features are extracted using Linjie's HERO_Video_Feature_Extractor.

If you have any issues with the feature file, please email us for alternative solutions.

  1. Install dependencies.

This code requires Python 3.7, PyTorch, and a few other Python libraries. We recommend creating a conda environment and installing all the dependencies as follows:

# create conda env
conda create --name denoise_loc python=3.7
# activate env
conda activate denoise_loc
# install pytorch with CUDA 11.6
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
# install other python packages
pip install tqdm ipython easydict tensorboard tabulate scikit-learn pandas
# compile and install nms1d and align1d
cd lib/align1d
python setup.py install
cd ../nms1d
python setup.py install

The PyTorch version we tested is 1.9.0.

Training

Training can be launched by running the following batch script:

sbatch slurm/trainval_v100_snr.sh

Alternatively, you may revise the original training script and run it locally:

bash moment_detr/scripts/train.sh

For more configurable options, please check out the config file moment_detr/config.py.

Inference

Once the model is trained, you can use the following command for inference:

bash moment_detr/scripts/inference.sh CHECKPOINT_PATH SPLIT_NAME

where CHECKPOINT_PATH is the path to the saved checkpoint, SPLIT_NAME is the split name for inference, and can be one of val and test.

We provide the results in Google Drive

More

Since our project was developed from Moment-DETR. Please refer to their codebase for:

  1. Pretraining and Finetuning
  2. Evaluation and Codalab Submission
  3. Train Moment-DETR on your dataset
  4. Run predictions on your videos and queries

Acknowledgement

This code is based on Moment-DETR, and the implementation is referred from G-TAD and DiffusionDet. We thank the authors for their awesome open-source contributions. We also thank @HSPUPIL for sharing the results and verifying the reproducibility.

LICENSE

The annotation files are under CC BY-NC-SA 4.0 license, see ./data/LICENSE. All the code are under MIT license; see LICENSE.

About

The official implementation of DenoiseLoc: Boundary Denoising for Video Activity Localization, ICLR 2024

License:MIT License


Languages

Language:Python 88.1%Language:Cuda 4.4%Language:C++ 3.5%Language:Shell 3.2%Language:C 0.8%