pixeli99 / TrackDiffusion

Official PyTorch implementation of TrackDiffusion (https://arxiv.org/abs/2312.00651)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TrackDiffusion

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

Pytorch implementation of TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
Pengxiang Li1*, Zhili Liu2*, Kai Chen2*, Lanqing Hong3, Yunzhi Zhuge1, Dit-Yan Yeung2, Huchuan Lu1, Xu Jia1^
1DLUT 2HKUST 3Huawei Noah's Ark Lab
*Equal Contribution ^Corresponding Authors

arXiv Project page

Abstract

Diffusion models have gained prominence in generating data for perception tasks such as image classification and object detection. However, the potential in generating high-quality tracking sequences, a crucial aspect in the field of video perception, has not been fully investigated. To address this gap, we propose TrackDiffusion, a novel architecture designed to generate continuous video sequences from the tracklets. TrackDiffusion represents a significant departure from the traditional layout-to-image (L2I) generation and copy-paste synthesis focusing on static image elements like bounding boxes by empowering image diffusion models to encompass dynamic and continuous tracking trajectories, thereby capturing complex motion nuances and ensuring instance consistency among video frames. For the first time, we demonstrate that the generated video sequences can be utilized for training multi-object tracking (MOT) systems, leading to significant improvement in tracker performance. Experimental results show that our model significantly enhances instance consistency in generated video sequences, leading to improved perceptual metrics. Our approach achieves an improvement of 8.7 in TrackAP and 11.8 in TrackAP$_{50}$ on the YTVIS dataset, underscoring its potential to redefine the standards of video data generation for MOT tasks and beyond.

Method

The framework generates video frames based on the provided tracklets and employs the Instance Enhancer to reinforce the temporal consistency of foreground instance. A new gated cross-attention layer is inserted to take in the new instance information..

framework

Training

Coming soon.

Results

  • Compare TrackDiffusion with other methods for generation quality:

main_results

  • Training support with frames generated from TrackDiffusion:
train

More results can be found in the main paper.

Visualization

  • Challenging Scenarios

Tracklet-to-video generation in the (a) scale variation (b) challenging overlapping and (c) re-occurrence scenarios.

cs

  • GOT10K Dataset got

  • YTVIS Dataset ytvis

More results can be found in the main paper and project page.

Cite Us

@article{li2023trackdiffusion,
  title={TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models},
  author={Li, Pengxiang and Liu Zhili, and Chen, Kai and Hong, Lanqing and Zhuge, Yunzhi and Yeung, Dit-Yan and Lu, Huchuan and Jia, Xu},
  journal={arXiv preprint arXiv:2312.00651},
  year={2023}
}

About

Official PyTorch implementation of TrackDiffusion (https://arxiv.org/abs/2312.00651)