Huntersxsx / TSGV-Learning-List

Temporal Sentence Grounding in Videos / Natural Language Video Localization / Video Moment Retrieval的相关工作

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TSGV-Learning-List

说明

总结了2017年至今在TSGV方向上的相关工作。 Temporal Sentence Grounding in Videos (TSGV) Natural Language Video Localization (NLVL) Video Moment Retrieval (VMR) 该任务的目标是给定一段语言描述,在一个未经裁剪的长视频中定位出该语言所描述的视频片段。

目录

数据集

Dataset Video Source Domain
TACoS Kitchen Cooking
Charades-STA Homes Indoor Activity
ActivityNet Captions Youtube Open
DiDeMo Flickr Open
MAD Movie Open

相关工作

Survey

Sliding Window-based Method

Sliding window-based method adopts a multi-scale sliding windows (SW) to generate proposal candidates.

Proposal Generated Method

Proposal generated (PG) method alleviates the computation burden of SW-based methods and generates proposals conditioned on the query.

Anchor-based Method

Anchor-based methods incorporates proposal generation into answer prediction and maintains the proposals with various learning modules.

Standard Anchor-based Method

2D-Map Anchor-based Method

Regression-based Method

Regression-based method computes a time pair ($t_s$, $t_e$) and compares the computed pair with ground-truth ($τ_s$, $τ_e$) for model optimization.

Span-based Method

Span-based methods aim to predict the probability of each video snippet/frame being the start and end positions of target moment.

Reinforcement Leaning-based Method

RL-based method formulates TSGV as a sequence decision making problem, and utilizes deep reinforcement learning techniques to solve it.

Other Supervised Method

Weakly-supervised TSGV Method

Under weakly-supervised setting, TSGV methods only need video-query pairs but not the annotations of starting/end time.

Multi-Instance Learning Method

Reconstruction-based Method

Other Weakly-supervised Method

参考

Survey of Zhang et al

About

Temporal Sentence Grounding in Videos / Natural Language Video Localization / Video Moment Retrieval的相关工作