TNL2K_Evaluation_Toolkit

Xiao Wang*, Xiujun Shu*, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu, Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark, IEEE CVPR 2021 (* denotes equal contribution). [Paper] [Project] [Slides] [TNL2K-BaiduYun (Code: pclt)] [TNL2K-OneDrive] [TNL2K-GoogleDrive] [SOT Paper List] [Benchmark-Results] [Demo Video (Youtube)] [COVE] [中文视频]

Abstract:

Tracking by natural language specification is a new rising research topic that aims at locating the target object in the video sequence based on its language description. Compared with traditional bounding box (BBox) based tracking, this setting guides object tracking with high-level semantic information, addresses the ambiguity of BBox, and links local and global search organically together. Those benefits may bring more flexible, robust and accurate tracking performance in practical scenarios. However, existing natural language initialized trackers are developed and compared on benchmark datasets proposed for tracking-by-BBox, which can't reflect the true power of tracking-by-language. In this work, we propose a new benchmark specifically dedicated to the tracking-by-language, including a large scale dataset, strong and diverse baseline methods. Specifically, we collect 2k video sequences (contains a total of 1,244,340 frames, 663 words) and split 1300/700 for the train/testing respectively. We densely annotate one sentence in English and corresponding bounding boxes of the target object for each video. A strong baseline method based on an adaptive local-global-search scheme is proposed for future works to compare. We believe this benchmark will greatly boost related researches on natural language guided tracking.

How to Download TNL2K dataset?

Currently, the dataset can be downloaded from the BaiduYun, OneDrive, or GoogleDrive:

1. Download from BaiduYun:

  Link: https://pan.baidu.com/s/1Joc5DqJUwGb4cGiFeh5Iug (Code: pclt)

2. Download from Onedrive: Click [here]

3. Download from GoogleDrive: Click [here]

Note: The annotations of 12 videos in the training subset are modified for more accurate annotation. Please update these videos with the [new annotations].

Tutorial for the Evaluaton Toolkit:

Download this github file:

git clone https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit

Unzip related files for evaluation:

cd annos && tar -sxvf ./annos.tar.gz

Download the benchmark results from: [Benchmark-Results]:

tar -sxvf ./tracking_results_TNL2K.tar.gz

Open the Matlab and run the script:

Evaluate_TNL2K_dataset.m

Wait and see final results:

Acknowledgement

This code is modified based on the evaluation toolkit of [LaSOT].

Citation:

If you find this work useful for your research, please cite the following papers:

@inproceedings{wang2021tnl2k,
  title={Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark},
  author={Xiao, Wang and Xiujun, Shu and Zhipeng, Zhang and Bo, Jiang and Yaowei, Wang and Yonghong, Tian and Feng, Wu},
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

If you have any questions about this work, please contact with me via wangxiaocvpr@foxmail.com or wangx03@pcl.ac.cn.

LiuAlex1109 / TNL2K_evaluation_toolkit