MUSES

This repo holds the code and the models for MUSES, introduced in the paper:
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H.S. Torr
CVPR 2021.

MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. We present a baseline aproach (denoted as MUSES-Net) that achieves SOTA peformance on MUSES. It also reports an mAP of 56.9% on THUMOS14 at IoU=0.5.

Refer to the paper and the project page for more information.

The code largely borrows from SSN and P-GCN. Thanks for their great work!

Updates

[2021.6.24] A combination of MUSES-Net and our latest work TadTR achieves 60.0% mAP, a new record on THUMOS14!
[2021.6.19] Code and the annotation file of MUSES are released. Please find the annotation file on our project page.

Usage Guide

Prerequisites

[back to top]

The code is based on PyTorch. The following environment is required.

Python 3
PyTorch >= 1.3.0
CUDA >= 9.2

Other minor Python modules can be installed by running

pip install -r requirements.txt

The code relies on CUDA extensions. Build them with the following command:

python setup.py develop

After installing all dependecies, run python demo.py for a quick test.

Data Preparation

[back to top]

We support experimenting with THUMOS14. The support for MUSES will come soon. To run the experiments, you can directly download the pre-extracted features.

THUMOS14: The features are provided by PGCN. You can download them from [OneDrive] (2.4G). Extract the archive and put the features in data/thumos14 folder. We expect the following structure in this folder.

- data
  - thumos14
    - I3D_RGB
    - I3D_Flow

Reference Models

[back to top]

Download models trained by us and put them in the reference_models folder:

THUMOS14: Models trained with RGB and Flow. [OneDrive]

Testing Trained Models

[back to top]

You can test the reference models and fuse different modalities on THUMOS14 by running a single script

bash scripts/test_reference_models.sh

Using these models, you should get the following performance

	RGB	Flow	RGB+Flow
mAP@IoU=0.5	46.4	53.9	56.9

The results with RGB+Flow at all IoU thresholds

0.10   | 0.20   | 0.30   | 0.40   | 0.50   | 0.60   | 0.70   | 0.80   | 0.90   | Average |
0.7377 | 0.7219 | 0.6893 | 0.6399 | 0.5685 | 0.4625 | 0.3097 | 0.1334 | 0.0192 | 0.4758

The testing process consists of two steps, detailed below.

Extract detection scores for all the proposals by running

python test_net.py DATASET CHECKPOINT_PATH RESULT_PICKLE --cfg CFG_PATH

Here, DATASET should be thumos14 or muses. RESULT_PICKLE is the path where we save the detection scores. CFG_PATH is the path of config file, e.g. data/cfgs/thumos14_flow.yml.

Evaluate the detection performance by running

python eval.py DATASET RESULT_PICKLE --cfg CFG_PATH

On THUMOS14, we need to fuse the detection scores with RGB and Flow modality. This can be done by running

python eval.py DATASET RESULT_PICKLE_RGB RESULT_PICKLE_FLOW --cfg CFG_PATH --score_weights 1 1.2 --cfg CFG_PATH_RGB

Training

[back to top]

Train your own models with the following command

python train_net.py  DATASET  --cfg CFG_PATH --snapshot_pref SNAPSHOT_PREF --epochs 20

SNAPSHOT_PREF: the path to save trained models and logs, e.g outputs/snapshpts/thumos14_rgb/.

We provide a script that finishes all steps on THUMOS14, including training, testing, and two-stream fusion. Run

bash scripts/do_all.sh

Citation

Please cite the following paper if you feel MUSES useful to your research

@InProceedings{Liu_2021_CVPR,
    author    = {Liu, Xiaolong and Hu, Yao and Bai, Song and Ding, Fei and Bai, Xiang and Torr, Philip H. S.},
    title     = {Multi-Shot Temporal Event Localization: A Benchmark},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {12596-12606}
}

Related Projects

TadTR: Temporal action detectioon (localization) with Transformer.

Contact

[back to top]

For questions and suggestions, file an issue or contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

About

Code for the paper "Multi-shot Temporal Event Localization: a Benchmark", CVPR 2021

https://songbai.site/muses/

Languages

Language:Jupyter Notebook 98.3%Language:Python 1.1%Language:Cuda 0.4%Language:C++ 0.2%Language:Shell 0.0%

allezsyh / MUSES

MUSES

Updates

Contents

Usage Guide

Prerequisites

Data Preparation

Reference Models

Testing Trained Models

Training

Citation

Related Projects

Contact

About

Languages