JackWoo0831 / Swin-JDE

JDE(Joint Detection and Embedding) with Swin-T backbone in VisDrone2019-MOT dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paper: 基于注意力机制的无人机对地多目标跟踪

What's new:
add JDE with Swin-S and Swin-B
update cfg files and train.py


Swin-JDE

This repo is JDE(Joint Detection and Embedding) with Swin-Transformer backbone in VisDrone2019-MOT dataset, The code is built on JDE and Swin Transformer.

gif

The structure of this model is as follow:

structure

Result on VisDrone2019-MOT test: (Used ByteTrack, high thresh=0.6, low thresh=0.2)

IDF1 Recall Precision FP FN MOTA MOTP FPS
JDE(with DarkNet53 backbone) 45.0 48.7 91.4 5777 64672 42.4 0.235 17.84
JDE(with Swin-T backbone) 48.2 54.6 88.7 8784 57202 45.9 0.249 23.55
JDE(with Swin-S backbone) 49.5 56.6 85.5 12094 54779 45.1 0.263 15.78
JDE(with Swin-B backbone) 47.2 53.9 87.6 9589 58191 44.3 0.247 15.87

Training details: JDE with Swin-T backbone is trained with:

  • Swin-T ImageNet pretrained model
  • Half train dataset, 27seqs
  • batch size=32,
  • optimizer AdamW, init lr=3e-4
  • 40Epochs, test with the best mAP model during training(which is 33rd epoch), lr x 0.1 at 31st epoch and 37th epoch(follow Swin Transformer paper)
  • 2 Tesla A100 GPUs, about 5 hours

JDE with DarkNet is similar. JDE with Swin-S is similar too, but due to GPU memory, the batch size is 24. And the batch size of JDE with Swin-B is 16.

Trained model:
Baidu Link(JDE with Swin-T): link

code:ngm1

TODO:
I will train MOT17 dataset to compare with DarkNet again
and I will try to reach better result on VisDrone.


1.Installation

Follow JDE installation is good. My env is:

  • python=3.7.0 pytorch=1.7.0 torchvision=0.8.0 cudatoolkit=11.0

you also need:

  • py-motmetrics (pip install motmetrics)
  • cython-bbox (pip install cython_bbox)
  • opencv

in order to use Swin Transformer, please install mmdetection:

pip install openmim
mim install mmdet

2.Training

Firstly you should generate image and annotations path following JDE format(see appendix):

For VisDrone dataset, you can run: part train dataset (27 seqs):

part train dataset:
python generate_labels_for_VisDronev2.py --if_certain_seqs

full train dataset:

python generate_labels_for_VisDronev2.py

generate test dataset path:

python generate_labels_for_VisDronev2.py --split 'VisDrone2019-MOT-test-dev'

Then train

if you want to use the Swin-T pretrained model, please download the model in Swin Transformer
(choose the Swin-T for Mask RCNN) and rename it as 'swin_t.pth', and put it in 'weights/'.

train with swin backbone:

python train.py --backbone 'swin' --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg'

Of course, if you want to use Swin-S, switch to yolov3_1088x608_newanchor3-swin_s.cfg

if you want to train on your own dataset, please modify the anchors in cfg file. You can use k-means cluster to choose your anchor size:

python choose_anchors.py choose_anchors.py

With multi GPUs:

CUDA_VISIBLE_DEVICES=2,3 python train.py --backbone 'swin' --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg'

3.Test

After training, you can test the model by:

python track.py --cfg 'cfg/yolov3_1088x608_newanchor3-swin_t.cfg' --weights 'weights/vis_40Epochs_anchor3_lr3e-4_swin_wd1e-2/best_mAP.pt' --test_visdrone --byte_track --save-images

Generally you need to modify the weights path, and if you don't want to use byte track and save images, delete the '--byte_track' and '--save-images'.

more details, check run_JDE.txt.

Appendix:
JDE annotation format:(see JDE)

format

About

JDE(Joint Detection and Embedding) with Swin-T backbone in VisDrone2019-MOT dataset


Languages

Language:Python 100.0%