jacobswan1 / Weakly-Supervised-Action-Localization-by-Sparse-Temporal-Pooling-Network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Referring Attention Learning for Weakly Supervised Temporal Grounding

Weakly supervised learning for temporal grounding, including action/activity localization and moment localization in videos. Top-down signal guided attention, or referring attention is novel; Losses should be novel (still developing); the use of Gumbel in the training is novel in term of binary attention.

STPN: Weakly Supervised Action Localization by Sparse Temporal Pooling Network

This is a re-implement of the paper Weakly Supervised Action Localization by Sparse Temporal Pooling Network, from Phuc, Google, CVPR 2018.

I3D features:

Create a feature directory and pour in I3D features provided from Sujoy Paul, UC, Riverside I3D features on Thumos 2014 (Optical flow obtained in 10 fps, I3D model pre-trained on kinetics)

Precision Updates on Thumos 2014 Challenges

Our baseline is not strong enough as people care more about IoU=0.5

map IoU=0.1 0.2 0.3 0.4 0.5
TCAM-paper 52.00 44.70 35.50 25.80 16.9
Ours-TCAM 40.96 37.65 26.27 16.98 10.39
Ours_Margin_Loss 47.91 39.08 28.66 20.14 13.26

Other Implementation and Results

map IoU=0.1 0.2 0.3 0.4 0.5
Demian Zhang-PT 40.80 34.00 26.90 20.50 14.40
JaeYoo Park-TF 52.10 44.20 34.70 26.10 17.70

Files

Training/Testing.ipynb: Files for T-CAM implementation and validation.

Training_LSTM.ipynb: Ours margin loss model based on T-CAM.

Training_Gumbel_LSTM.ipynb: In a mess and still working on it, please ignore it. :)

About


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%