TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
Pytorch implementation of TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
Pengxiang Li1*, Zhili Liu2*, Kai Chen2*, Lanqing Hong3, Yunzhi Zhuge1, Dit-Yan Yeung2, Huchuan Lu1, Xu Jia1^
1DLUT 2HKUST 3Huawei Noah's Ark Lab
*Equal Contribution ^Corresponding Authors
Abstract
Diffusion models have gained prominence in generating data for perception tasks such as image classification and object detection. However, the potential in generating high-quality tracking sequences, a crucial aspect in the field of video perception, has not been fully investigated. To address this gap, we propose TrackDiffusion, a novel architecture designed to generate continuous video sequences from the tracklets. TrackDiffusion represents a significant departure from the traditional layout-to-image (L2I) generation and copy-paste synthesis focusing on static image elements like bounding boxes by empowering image diffusion models to encompass dynamic and continuous tracking trajectories, thereby capturing complex motion nuances and ensuring instance consistency among video frames. For the first time, we demonstrate that the generated video sequences can be utilized for training multi-object tracking (MOT) systems, leading to significant improvement in tracker performance. Experimental results show that our model significantly enhances instance consistency in generated video sequences, leading to improved perceptual metrics. Our approach achieves an improvement of 8.7 in TrackAP and 11.8 in TrackAP$_{50}$ on the YTVIS dataset, underscoring its potential to redefine the standards of video data generation for MOT tasks and beyond.
Method
The framework generates video frames based on the provided tracklets and employs the Instance Enhancer to reinforce the temporal consistency of foreground instance. A new gated cross-attention layer is inserted to take in the new instance information..
Training
Coming soon.
Results
- Compare TrackDiffusion with other methods for generation quality:
- Training support with frames generated from TrackDiffusion:
More results can be found in the main paper.
Visualization
- Challenging Scenarios
Tracklet-to-video generation in the (a) scale variation (b) challenging overlapping and (c) re-occurrence scenarios.
More results can be found in the main paper and project page.
Cite Us
@article{li2023trackdiffusion,
title={TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models},
author={Li, Pengxiang and Liu Zhili, and Chen, Kai and Hong, Lanqing and Zhuge, Yunzhi and Yeung, Dit-Yan and Lu, Huchuan and Jia, Xu},
journal={arXiv preprint arXiv:2312.00651},
year={2023}
}