wangxiao5791509/VisEvent_SOT_Benchmark

The First Large-scale Benchmark Dataset for Reliable Object Tracking by fusing RGB and Event Cameras

• Project • arXiv • Baselines • DemoVideo • Tutorial •

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows[J] arXiv preprint arXiv:2108.05015, 2021, Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, Feng Wu

News:

🔥 [2024.03.12] A New Long-term RGB-Event based Visual Object Tracking Benchmark Dataset (termed FELT) is available at [Paper] [Code] [DemoVideo]
🔥 [2023.09.27] A High Definition (HD) Event based Visual Object Tracking Benchmark Dataset (termed EventVOT) is available at [arXiv] [Github]
🔥 [2023.09.20] VisEvent is accepted by IEEE Transactions on Cybernetics [IEEE]
🔥 [2022.11.27] Due to some aedat4 files are missing, you can use part of this dataset according to this list: [aedat4HARDVS_list]
🔥 [2022.11.23] A new Color Frame + Event Stream based Tracking dataset COESOT is available at [arXiv] [GitHub]
🔥 [2022.10.19] Event camera (DVS, Spike) based Papers Published on Top International Conference [ Event_Camera_in_Top_Conference]
[2022.07.14] Update the VOT2019-RGB-Event dataset used in our paper [BaiduYun].
[2022.02.10] Update paper list for event camera based tracking [Event_Tracking_Paper_List].
[2021.10.13] Update the links for onedrive.

Introduction

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency. In practice, the visible cameras can better perceive texture details and slow motion, while event cameras can be free from motion blurs and have a larger dynamic range which enables them to work well under fast motion and low illumination. Therefore, the two sensors can cooperate with each other to achieve more reliable object tracking. In this work, we propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task. Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios, and it is divided into a training and a testing subset, each of which contains 500 and 320 videos, respectively. Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods by extending current single-modality trackers into dual-modality versions. More importantly, we further build a simple but effective tracking algorithm by proposing a cross-modality transformer, to achieve more effective feature fusion between visible and event data. Extensive experiments on the proposed VisEvent dataset, and two simulated datasets (i.e., OTB-DVS and VOT-DVS), validated the effectiveness of our model.

Demo Video

A demo video for the VisEvent can be found by cliking the image below:

Video Tutorial

The tutorial for this paper can be found by cliking the image below:

How to Download VisEvent Dataset?

BaiduYun (about 216 GB):

Link：https://pan.baidu.com/s/1VhdORXT4OvG8TUESfDZHfw 
Password：AHUE

Onedrive: Click both [here] and [VisEvent_train.z05].
Googledrive: Click [here]

Links for VOT2019-RGB-Event dataset (36.3 GB) used in our paper

BaiduYun:

Link：https://pan.baidu.com/s/1cS79d1dJFD8mF0AwuGG5Og   Password: AHUT

Googledrive: Click [here]

Baseline Methods

The source code of baseline trackers by fusing dual-modalities can be found at: [RGB-DVS-SOT-Baselines].

How to load the aedat4 file?

We provide a python script (read_aedat4.py) to load the aedat4 file. You can download one aedat4 file to feel the data style: [dvSave-2021_12_21_16_32_19.aedat4]

Here is an example:

1). Install the required toolkit [dv-gui]. Please use different scripts for various Ubuntu OS:

Ubuntu 20.04:

sudo add-apt-repository ppa:inivation-ppa/inivation
sudo apt-get update
sudo apt-get install dv-gui

Ubuntu 18.04:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo add-apt-repository ppa:inivation-ppa/inivation-bionic
sudo apt-get update
sudo apt-get install dv-gui

Ubuntu 16.04:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test 
sudo add-apt-repository ppa:lkoppel/opencv 
sudo add-apt-repository ppa:janisozaur/cmake-update 
sudo add-apt-repository ppa:inivation-ppa/inivation-xenial 
sudo apt-get update 
sudo apt-get install dv-gui

Other softwares:

pip install dv
pip install opencv-python numpy pillow -i https://pypi.tuna.tsinghua.edu.cn/simple

[ref] https://gitlab.com/inivation/dv/dv-python

2). Open your terminal and run the script:

python read_aedat4.py

Evaluation ToolKit

Only matlab version is available.

1. Download this github:

git clone https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark

2. Download the tracking results of our benchmark: [GoogleDrive (185MB)]

unzip tracking_results_VisEvent_SOT_benchmark.zip, and put it into the folder "tracking_results". 

unzip the "annos.zip" in the folder "annos"

3. Open your matlab, and run the script "Evaluate_VisEvent_SOT_benchmark.m". Wait and check the final evaluated figures

More Related Materials

[Github-1] https://github.com/wangxiao5791509/SNN_CV_Applications_Resources
[Github-2] https://github.com/uzh-rpg/event-based_vision_resources
[Github-3] https://github.com/wangxiao5791509/Single_Object_Tracking_Paper_List
[Survey] 神经形态视觉传感器的研究进展及应用综述，计算机学报，李家宁, 田永鸿 [Paper]
[Survey] Event-based Vision: A Survey, Guillermo Gallego, et al., IEEE T-PAMI 2020, [Paper]
[Survey] Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks, Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, Lin Wang, [Paper]
[FE108 dataset] Object Tracking by Jointly Exploiting Frame and Event Domain, Jiqing Zhang, et al., ICCV 2021, [Project] [added 33 videos, FE240 dataset, Baidu Cloud: password 68x3] [DemoVideo] [Github] [Dataset] [Paper] [[Baiduyun](链接：https://pan.baidu.com/s/1GFfCULGbSiv7FWCKgkb8_g 提取码：AHUT)]
[SpikingJelly] (SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch) [OpenI from PCL] [GitHub] [Documents]
[Event-Toolkit] https://github.com/TimoStoff/event_utils (Various representations can be obtained with (a) the raw events, (b) the voxel grid, (c) the event image, (d) the timestamp image.)
[aedat 2.0.1] AEDAT is a fast AEDAT 4 python reader, with a Rust underlying implementation. Run pip install aedat to install it. [pypi.org] [Github]

License

This project is under the MIT license. See [license] for details.

📃 BibTex:

If you find this work useful for your research, please cite the following papers:

@article{wang2021viseventbenchmark,
  title={VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows},
  author={Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, Feng Wu},
  journal={arXiv:2108.05015},
  year={2021}
}

If you have any questions about this work, please submit an issue or contact me via Email: wangxiaocvpr@foxmail.com, xiaowang@ahu.edu.cn, or Wechat: wangxiao5791509. Thanks for your attention!

wangxiao5791509 / VisEvent_SOT_Benchmark