SSSGCN

It is currently being organized and will be open-sourced. The code and data will be made public after undergoing our de-identification review.

Datasets

We collected real Tai Chi video data, which was professionally annotated with scores by sports experts. This data aims to explore potential complex action features, differing from traditional classification-based rating evaluations, such as grading actions as A, B, C, or D levels.

Why do we use continuous variables as labels: Although it may be cumbersome to modify the granularity of performance ratings established in classification tasks, it is generally possible to adjust them through methods such as reorganizing datasets and retraining models. Additionally, it is generally true that finer-grained classification tasks tend to be more challenging. Adopting smoothed labels and regression models can indeed lead to higher performance and finer-grained assessments, which better align with real examination and teaching scenarios. Although it requires more significant effort, this approach is more in line with real-world applications.

Why don't We directly compare feature values as in facial recognition: In action scoring tasks, directly comparing feature values may overlook spatial and temporal information of the actions. Additionally, sports experts have pointed out that the evaluation of scores should not solely rely on the similarity of actions; it involves a certain level of subjectivity or artistry. We aim for our data to provide this information and enable the model to represent it.

https://drive.google.com/drive/folders/1ZTsiah25xqdNVz9kxE4-tHAG2uSbF-AC?usp=drive_link

Augmentation


8k_aug	16k_aug


principle=0.4	principle=0.6

principle=1.0	clip

One-Stage

We initially aimed to achieve both classification and regression simultaneously through a one-stage approach. However, despite our efforts, the final classification and regression performance (as shown in model iv) did not meet our expected metrics.Additionally, under the guidance of experts, we designed a reasonable data augmentation method.

Model Structure

i) and ii)

iii)

iv)

Exp

Model	Taichi score MAE	Taichi classification Acc
i	0.2021	59.17%
ii	0.0965	84.42%
iii	0.0862	86.26%
iv	0.0782	95.58%

i) Extract features using the ST-GCN backbone and feed the obtained feature map into both the classification and regression heads with CoLU.

ii) Building upon i, using the data augmentation.

iii) Building upon ii, split the feature map along the spatial dimension into two parts, and then separately feed them into the classification and regression heads.

iv) Building upon iii, concatenate the feature embeddings from the classification head with the input to the regression head.

Two-Stage

Cls Exp

NTU-RGB-D Ablation

ST-GCN vs STD-GCN vs SST-GCN vs SSTD-GCN vs ST-GCN++ vs SSTD-GCN++ Demo

Google Colab Demo Note: the metrics in the Colab demo might experience slight variations due to version changes, but the overall performance should be approximately similar.

Taichi Cls Ablation

Google Colab Demo

Model	NTU-RGB-D	Taichi	Param. (M)	FLOPs (G)
ST-GCN	76.00%	65.47%	0.17	0.20
STGL-GCN	77.50%	83.75%	2.78	1.89
SSTD-GCN(ours)	87.00%	99.17%	0.18	0.11
ST-GCN++	90.50%	93.33%	3.09	0.60
SSTD-GCN++(ours is embedded to ST-GCN++)	92.00%	99.58%	0.32	0.61

Reg Exp

Taichi Scoring Reg Ablation

Model	Spacial Separate	Temporal Dilation	Taichi score MAE
ix	❌	❌	0.0355
x	❌	❌	0.0295
xi	❌	✔️	0.0243
xii	✔️	❌	0.0261
xiii	✔️	✔️	0.0196

divided7 / SSSGCN