VID-sentence Dataset

This repo contains the annotations of the VID-sentence dataset introduced in Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video (WSSTG).

An example

Descriptions: "A large elephant runs in the water from left to right."

Requirements: software
Setup
Tools

Requirements: software

python 3.6
cv2
shutil
commands
json
h5py
ffmpeg (for visualization)

Setup

Download the original images Video Object Dection dataset (VID) from the official website.
Create symlinks between the images of VID dataset and VID-sentence dataset.

  cd $VID-sentence_ROOT
  ln -s  $VID_ROOT/data/VID/train $VID-sentence_ROOT/data/VID/train
  ln -s  $VID_ROOT/data/VID/val $VID-sentence_ROOT/data/VID/val
  mv  $VID_ROOT/data/VID/test $VID-sentence_ROOT/data/VID/test_backup
  ln -s  $VID_ROOT/data/VID/val $VID-sentence_ROOT/data/VID/test

Note: the testing set of VID-sentence is generated by spliting the validation set of VID.

Tools

We give an example how to visualize the annotations of the dataset by running the following script.

sh vis_instance.sh

License

WSSTG is released under the CC-BY-NC 4.0 LICENSE (refer to the LICENSE file for details).

Citing WSSTG

If you find this dataset/repo useful in your research, please consider citing:

@inproceedings{chen2019weakly,
    Title={Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video},
    Author={Chen, Zhenfang and Ma, Lin and Luo, Wenhan and Wong, Kwan-Yee K},
    Booktitle={ACL},
    year={2019}
}

Contact

You can contact Zhenfang Chen by sending email to chenzhenfang2013@gmail.com

About

This repository provides the dataset introduced by our WSSTG paper

Other

Languages

Language:JavaScript 100.0%Language:Python 0.0%Language:Shell 0.0%