Swin-JDE

This repo is JDE(Joint Detection and Embedding) with Swin-Transformer backbone in VisDrone2019-MOT dataset, The code is built on JDE and Swin Transformer.

The structure of this model is as follow:

Result on VisDrone2019-MOT test: (Used ByteTrack, high thresh=0.6, low thresh=0.2)

	IDF1	Recall	Precision	FP	FN	MOTA	MOTP	FPS
JDE(with DarkNet53 backbone)	45.0	48.7	91.4	5777	64672	42.4	0.235	17.84
JDE(with Swin-T backbone)	48.2	54.6	88.7	8784	57202	45.9	0.249	23.55
JDE(with Swin-S backbone)	49.5	56.6	85.5	12094	54779	45.1	0.263	15.78
JDE(with Swin-B backbone)	47.2	53.9	87.6	9589	58191	44.3	0.247	15.87

Training details: JDE with Swin-T backbone is trained with:

Swin-T ImageNet pretrained model
Half train dataset, 27seqs
batch size=32,
optimizer AdamW, init lr=3e-4
40Epochs, test with the best mAP model during training(which is 33rd epoch), lr x 0.1 at 31st epoch and 37th epoch(follow Swin Transformer paper)
2 Tesla A100 GPUs, about 5 hours

JDE with DarkNet is similar. JDE with Swin-S is similar too, but due to GPU memory, the batch size is 24. And the batch size of JDE with Swin-B is 16.

Trained model:
Baidu Link(JDE with Swin-T): link

code：ngm1

TODO:
I will train MOT17 dataset to compare with DarkNet again
and I will try to reach better result on VisDrone.

1.Installation

Follow JDE installation is good. My env is:

python=3.7.0 pytorch=1.7.0 torchvision=0.8.0 cudatoolkit=11.0

you also need:

py-motmetrics (pip install motmetrics)
cython-bbox (pip install cython_bbox)
opencv

in order to use Swin Transformer, please install mmdetection:

pip install openmim
mim install mmdet

2.Training

Firstly you should generate image and annotations path following JDE format(see appendix):

For VisDrone dataset, you can run: part train dataset (27 seqs):

part train dataset:
python generate_labels_for_VisDronev2.py --if_certain_seqs

full train dataset:

python generate_labels_for_VisDronev2.py

generate test dataset path:

python generate_labels_for_VisDronev2.py --split 'VisDrone2019-MOT-test-dev'

Then train

if you want to use the Swin-T pretrained model, please download the model in Swin Transformer
(choose the Swin-T for Mask RCNN) and rename it as 'swin_t.pth', and put it in 'weights/'.

train with swin backbone:

python train.py --backbone 'swin' --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg'

Of course, if you want to use Swin-S, switch to yolov3_1088x608_newanchor3-swin_s.cfg

if you want to train on your own dataset, please modify the anchors in cfg file. You can use k-means cluster to choose your anchor size:

python choose_anchors.py choose_anchors.py

With multi GPUs:

CUDA_VISIBLE_DEVICES=2,3 python train.py --backbone 'swin' --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg'

3.Test

After training, you can test the model by:

python track.py --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg' --weights 'weights/vis_40Epochs_anchor3_lr3e-4_swin_wd1e-2/best_mAP.pt' --test_visdrone --byte_track --save-images

Generally you need to modify the weights path, and if you don't want to use byte track and save images, delete the '--byte_track' and '--save-images'.

more details, check run_JDE.txt.

Appendix:
JDE annotation format:(see JDE)

JackWoo0831 / Swin-JDE

Paper: 基于注意力机制的无人机对地多目标跟踪

Swin-JDE

1.Installation

2.Training

3.Test

About

Languages