LOGO: A Long-Form Video Dataset for Group Action Quality Assessment (CVPR 2023)

Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang†

[Paper] [Google Drive] [Baidu Drive] (extract number: v329)

This repository contains the LOGO dataset and PyTorch implementation for the paper "LOGO: A Long-Form Video Dataset for Group Action Quality Assessment" (CVPR 2023)

💡 LOGO Dataset and GOAT Pipeline

LOGO is a multi-person long-form video dataset with frame-wise annotations on both action procedures (as shown in the second line) and formations (as shown in the third line, which reflects relations among actors) based on artistic swimming scenarios. It also contains score annotations for AQA.

GOAT (short for GrOup-aware ATtention)

📋 To-Do List

Release the dataset
The code of GOAT
Pretrained features for LOGO

📚 Dataset

🗒️ Lexicon

LOGO is organized by temporal structure, which contains action and formation manual annotations. Herein, we design the labeling system with professional artistic swimming athletes to construct a lexicon for annotation, considering FINA rules and the actual scenario of the competitions. In the Technical event, the group size is eight people, the video length is $170±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and five Required Elements. Each competition cycle needs to complete five Required Elements, at least two Acrobatic movements, and at least one Cadence action. In the Free events, there are 8 people, the video length is $240±15s$, and the actions include Upper, Lower, Float, None, Acrobatic, Cadence, and Free elements. When performing Required, Upper, Lower, and Float, the athletes form neat polygons.

🖊️ Annotation

Given an RGB artistic swimming video, the annotator utilizes our defined lexicon to label each frame with its action and formation. We accomplish the 25fps frame-wise action annotation stage utilizing the COIN Annotation Toolbox and the 1fps frame-wise formation labels using Labelme. Specifically, we set strict rules defining the boundaries between artistic swimming sequences and the formation marking position and employ eight workers with prior knowledge in the artistic swimming domain to label the dataset frame by frame following the rules. The annotation results of one worker are checked and adjusted by another, which ensures annotation results are double-checked.

The annotation information is saved in [Google Drive] or [Baidu Drive] (extract number: ojgf)

The annotation information contained in anno dict.pkl for each sample is:

List Num.	Type	Description	Example
`0`	string	Event type.	'tech'
`1`	float	The score of the video.	90.25
`2`	float	/	/
`3`	list	End frame of the action instance.	[76, 141, 187, 246, 263, ···]
`4`	list	Action type of each frame.	[12, 12, 12, 12, 12, ···]

📈 Statistics

The LOGO dataset consists of 200 video samples from 26 events with 204.2s average duration and above 11h total duration, covering 3 annotation types, 12 action types, and 17 formation types.

💾 Download

Video_Frames: [Google Drive] or [Baidu Drive] (extract number: v329)
Annotations and Split: [Google Drive] or [Baidu Drive] (extract number: ojgf)

📓 Data Preparation

The prepared dataset ([Google Drive] or [Baidu Drive] (extract number: v329) ) and annotations ([Google Drive] or [Baidu Drive] (extract number: ojgf)) are already provided in this repo.
The data structure should be:

$DATASET_ROOT
├── LOGO
|  ├── WorldChampionship2019_free_final
|     ├── 0
|        ├── 00000.jpg
|        ...
|        └── 06249.jpg
|     ...
|     └── 11
|        ├── 00000.jpg
|        ...
|        └── 06249.jpg
|  ...
|  └── WorldChampionship2022_free_final
|     ├── 0
|     ...
|     └── 7 
└──

💻 Code for Group-aware Attention (GOAT)

⭐️ Performance

⚙️ Pretrain Model

The Kinetics pretrained I3D downloaded from the repository kinetics_i3d_pytorch

model_rgb.pth

🗂️ Requirement

torch_videovision

pip install git+https://github.com/hassony2/torch_videovision

📊 Training

USDL + GOAT + I3D

cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=7e-06 --weight_decay=0.001 --use_i3d_bb=1 --use_swin_bb=0

USDL +GOAT + Video Swin-Transformer

cd ./MUSDL-GOAT/MTL-AQA
python main.py --lr=1e-05 --weight_decay=0.0001 --use_i3d_bb=0 --use_swin_bb=1

CORE + GOAT + I3D

cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=1e-06 --warmup=0 --use_i3d_bb=1 --use_swin_bb=0 --bs_train=2 --weight_decay=1e-5

CORE + GOAT + Video Swin-Transformer

cd ./CoRe-GOAT/MTL-AQA
python main.py --lr=3e-07 --warmup=0 --use_i3d_bb=0 --use_swin_bb=1 --bs_train=1 --weight_decay=1e-5

📧 Contact

E-mail: sy-zhang23@mails.tsinghua.edu.cn

WeChat: ZSYi-408

shiyi-zh0408 / LOGO