This innovative project seeks to streamline surveillance by efficiently monitoring and tracing the movement of various people using a single camera. The system is a breakthrough step to achieve a more ambitious system, a Muti-Camera Multi-Object Tracking system.

Additional running tutorial

Data Generating

For faster and easier generating process, we first design a MAIN_DATA_TREE like below:

|	|
|       |       |
|       |       |__video_name_1.mp4
|       |       |__video_name_1
|       |       |       |
|       |       |       |__gt
|       |       |           |__gt.txt
|       |       |           |__labels.txt
|       |       |
|       |       |
|       |       |__video_name_2.mp4
|       |       |__video_name_2
|       |               |
|       |               |__gt
|       |                   |__gt.txt
|       |                   |__labels.txt
|       |
|	|
|	|__detection_dataset
|	|		|
|       |               |___ images
|       |               |       |___ train
|       |               |       |       |___ frame_xxxxxx.jpg
|       |               |       |       |___ ...
|       |               |       |___ val
|       |               |               |___ frame_xxxxxx.jpg
|       |               |               |___ ...       
|       |               |        
|       |               |___ labels
|       |                       |___ train
|       |                       |       |___ frame_xxxxxx.txt
|       |                       |       |___ ...
|       |                       |___ val
|       |                               |___ frame_xxxxxx.txt
|       |                               |___ ...  
|	|
|	|__reid_dataset
|			|
|			|__query
|			|    |
|			|    |__id_1
|                       |    |    |__frame_xxxxxx.jpg
|			|    |
|			|    |__id_2
|                       |         |__frame_xxxxxx.jpg
|			|
|			|__gallery
|			|    |
|			|    |__id_1
|                       |    |    |__frame_xxxxxx.jpg
|                       |    |    |__...
|			|    |
|			|    |__id_2
|                       |         |__frame_xxxxxx.jpg
|                       |         |__...
|			|
|			|__train
|			     |
|			     |__id_1
|                            |    |__frame_xxxxxx.jpg
|                            |    |__...
|			     |
|			     |__id_2
|                                 |__frame_xxxxxx.jpg
|                                 |__...

Where each data_version folder refer to a sub dataset. In each data_version folder, there are TRAIN_DATASET refer to general data tree, detection_dataset is formatted follow YOLOV5 input format, and reid_dataset follow Market1501 format.

There are also combine_dataset, which includes mix data of all previous versions.

Each data version will be updated into the designed tree as above.

Generate Detection Data

Will need first prepare a frame folder and labels folder as below:

	|	| 
	|	|___ video_name_1
	|	|       |___ frame_xxxxxx.jpg
	|	|       |___ ...
	|	|___ video_name_2
	|		|___ frame_xxxxxx.jpg
	|		|___ ...
		|___ video_name_1
		|       |___ frame_xxxxxx.jpg
		|       |___ ...
		|___ video_name_2
			|___ frame_xxxxxx.jpg
			|___ ...
	|	| 
	|	|___ video_name_1
	|	|       |___ frame_xxxxxx.txt
	|	|       |___ ...
	|	|___ video_name_2
	|		|___ frame_xxxxxx.txt
	|		|___ ...
		|___ video_name_1
		|       |___ frame_xxxxxx.txt
		|       |___ ...
		|___ video_name_2
			|___ frame_xxxxxx.txt
			|___ ...

Where each video labels folder is in YOLO label format.

To do this, do the following steps:

cd generate_data
python generate_data_tree

Where data_root is the folder contains raw VTX data. We use this folder to search for specific videos. gt_root is the root folder contains CVAT annotation files. And tree_root is the root of the tree data tree we want to create.

We then symlink from these folder to the MAIN_DATA_TREE for saving storage and generating time.

cd generate_data

In combine_train_detection_dataset.py, declare root_frames_dir and root_labels_dir as path to these 2 above folder. Declare combine_frames_dir and combine_labels_dir as path to detection_dataset folder in MAIN_DATA_TREE. Run:

python combine_train_detection_dataset.py

Generate ReID Data

Follow tutorial from this repo

Detection Module

Training Custom Data

Follow Training Custom Data


  1. Create data folder in the right format (YOLOV5 will check label paths corresponding to image paths.
  2. In data.yaml file, set nc = 1, name = [person]. Replace train and val with absolute paths instead of relative paths as in the above tutorial.
  3. config yaml file should be placed in yolov5/data.
  4. Training with crowdhuman_yolov5 checkpoint need to set Optimizer: ... as None first, or else it'll be conflict during training.
  5. File hyp.scratch.yaml in case it's not included in original repo (để trong folder yolov5/data/)
  6. Specific GPU for training
  7. If evaluating on different dataset using pretrained model, we need to remove best_fitness score of the checkpoint. Note line 155 in yolov5/train.py to remove best_fitness score of the checkpoint.
  8. Training script
python train.py --data {data_yaml_file_config} --epochs {num_epochs} --batch {batches} --weights {weights path} --cfg {model config path} --device 0

If we use crowdhuman_yolov5 checkpoint, then we can use yolov5m config file in yolov5/models/yolov5m.yaml

Training result will be saved in /yolov5/runs/train/exp{x}.

Model best checkpoint after finetuned num class heads and training for 30 epoch on VTX DATA: Checkpoint

Evaluate Detection Module (YOLOV5)

python test.py --data {data_yaml_file_config} --weights {weights_path} --save-txt --save-conf

Where data config yaml file set train path and val path as absolute path to images folder of test data (the model will test all images in the folder)

File label after evaluated will be save in /yolov5/runs/test/exp{x}

Evaluate result of Finetune Model on VTX DATA after training for 30 epochs and Model pretrained on CrowdHman Dataset: Evaluation Results

Inferrence Detection Module (YOLOV5)

python detect.py --source {data_source_path} --weights {weights_path} --save-txt --save-conf

Where source can be path to 1 image or a whole image folder

Inference result is saved in yolov5/runs/detect/exp{x}

ReID module

Prepare data format for ReID module

The data format for ReID module is:

|___ train
|       |___ id_1
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...
|       |
|       |___ id_2
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...       
|       |
|       |___ id_n
|             |___ frame_xxxxxx.jpg
|             |___ ...       
|___ test
|       |___ id_1
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...
|       |
|       |___ id_2
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...       
|       |
|       |___ id_n
|             |___ frame_xxxxxx.jpg
|             |___ ...
|___ gallery
|       |___ id_1
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...
|       |
|       |___ id_2
|       |     |___ frame_xxxxxx.jpg
|       |     |___ ...       
|       |
|       |___ id_n
|             |___ frame_xxxxxx.jpg
|             |___ ...  
|___ query
        |___ id_1
        |     |___ frame_xxxxxx.jpg
        |     |___ ...
        |___ id_2
        |     |___ frame_xxxxxx.jpg
        |     |___ ...       
        |___ id_n
              |___ frame_xxxxxx.jpg
              |___ ...  

For VTX Data, the train folder contains 90% of the combine train data, the test folder contains the rest 10%.

The gallery folder contains the whole test dataset. And the query folder is random splitted from gallery, 1 image for each id.

Training ReID module

The model have 2 checkpoints in the beginning. We can find those here.

To use ckpt.t7 weight, in deep_sort_pytorch/deep_sort/deep/model.py, use the Author's finetuned model and set num_classes = 751

To use original_ckpt.t7 weight, in deep_sort_pytorch/deep_sort/deep/model.py, use the Original model and set num_classes = 625

To use the trained weight on VTX DATA, in deep_sort_pytorch/deep_sort/deep/model.py, use the Author's finetuned model and set num_classes = 868

Training script:

python train.py --data-dir {path/to/data/root/dir} --ckpt {path/to/pretrained/reid/checkpoint} --save-ckpt-path {path/to/save/best/checkpoint} --save-result {path/to/save/training/curve/image}

You can find more arguments in deep_sort_pytorch/deep_sort/deep/train.py

Testing ReID module

This step is used to create a features matrix for evaluating result.

In deep_sort_pytorch/deep_sort/deep, run:

python test.py --data-dir {path/to/data/root/dir} --ckpt {path/to/reid/checkpoint} --save-path {path/to/save/features/metric}

Evaluating ReID module

These below functions take a dictionary features as input. The features dictionary includes the following keys:

qf: matrix of vector features for each query
ql: matrix of query labels
gf: matrix of vector features for each gallery
gl: matrix of gallery labels
query_paths: list of paths for all query images 	
gallery_paths: list of paths for all gallery images	

1. Evaluating on the whole gallery for all queries

In deep_sort_pytorch/deep_sort/deep, run:

python evaluate.py --predict-path {path/to/saved/features/metric} --p_k {k in P@k evaluation} --mAP_n {n in mAP@n evaluation}

2. Evaluating each query on a gallery base on frame id of each query

Since the ReID module in Deepsort mainly focus on solving the ID switch problem in tracking process, it's unnecessary to search a query on the whole gallery. Instead of that, we just need to evaluate a query in a certain frame length.

For example, a query instance that appear in frame x just need to be evaluated on a gallery with all instance from frame x - range to frame x + range, where range is a pre-defined number (we set range = 100 by default).

In deep_sort_pytorch/deep_sort/deep, run:

python evaluate_frame_base.py --predict-path {path/to/saved/features/metric} --p_k {k in P@k evaluation} --mAP_n {n in mAP@n evaluation}

3. Evaluating each query on a gallery base on trajectory of each person id In this algorithm, for each person id, we first find the first frame the id appears and the last frame the id appears, and define them as start_trajectory_frame and end_trajectory_frame of the id's trajectory.

We then evaluate a query on a gallery with all instance from frame start_trajectory_frame - range to frame end_trajectory_frame + range, where range is a pre-defined number (we set range = 100 by default).

However, this algorithm is almost similar as the 2th algorithm (base on query frame id), but run quite slower.

In deep_sort_pytorch/deep_sort/deep, run:

python evaluate_trajectory_base.py --predict-path {path/to/saved/features/metric} --p_k {k in P@k evaluation} --mAP_n {n in mAP@n evaluation}

Note that since each gallery is only in a limited number of frame, there can be some cases that the number of instances is smaller than k and n in p@k and mAP@n evaluation. However, it's quite rare because k and n is small (5 or 10). We can solve it easily by just ignore that gallery and reference query.

There is also optional param for showing top k matched images for each query.


Additional tracking source

Tracking with video (default):

python track_video.py --source {path/to/mp4/video} 

Tracking with ensemble predicted result instead:

python track.py --frame_dir {path/to/frame/dir} --det_pred_dir {path/to/ensemble/predict/dir} --gt_path {path/to/gt/file} --output {path/to/output/dir} --save-txt 

Where det_pred_dir is in mmdetection predict format, which is <class_name> <confidence> <left> <top> <right> <bottom> for each txt file. gt file is in MOT format.

The output is in MOT format, which is <frame>, <id>, <bb_top>, <bb_left>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>.

Tracking Evaluation

Following tutorial from this repo

Yolov5 + Deep Sort with PyTorch

CI CPU testing
Open In Colab


This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algorithm which tracks the objects. It can track any object that your Yolov5 model was trained to detect.


Before you run the tracker

  1. Clone the repository recursively:

git clone --recurse-submodules https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch.git

If you already cloned and forgot to use --recurse-submodules you can run git submodule update --init

  1. Make sure that you fulfill all the requirements: Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install, run:

pip install -r requirements.txt

Tracking sources

Tracking can be run on most video formats

python3 track.py --source ... --show-vid  # show live inference results as well
  • Video: --source file.mp4
  • Webcam: --source 0
  • RTSP stream: --source rtsp://
  • HTTP stream: --source http://wmccpinetop.axiscam.net/mjpg/video.mjpg

Select a Yolov5 family model

There is a clear trade-off between model inference speed and accuracy. In order to make it possible to fulfill your inference speed/accuracy needs you can select a Yolov5 family model for automatic download

python3 track.py --source 0 --yolo_weights yolov5s.pt --img 640  # smallest yolov5 family model
python3 track.py --source 0 --yolo_weights yolov5x6.pt --img 1280  # largest yolov5 family model

Filter tracked classes

By default the tracker tracks all MS COCO classes.

If you only want to track persons I recommend you to get these weights for increased performance

python3 track.py --source 0 --yolo_weights yolov5/weights/crowdhuman_yolov5m.pt --classes 0  # tracks persons, only

If you want to track a subset of the MS COCO classes, add their corresponding index after the classes flag

python3 track.py --source 0 --yolo_weights yolov5s.pt --classes 16 17  # tracks cats and dogs, only

Here is a list of all the possible objects that a Yolov5 model trained on MS COCO can detect. Notice that the indexing for the classes in this repo starts at zero.

MOT compliant results

Can be saved to inference/output by

python3 track.py --source ... --save-txt


If you find this project useful in your research, please consider cite:

    title={Real-time multi-object tracker using YOLOv5 and deep sort},
    author={Mikel Broström},
    howpublished = {\url{https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch}},

Other information

For more detailed information about the algorithms and their corresponding lisences used in this project access their official github implementations.

draw pred and gt boxes:

File track.py:

  • Comment line 298
  • Remove opt.mode in line 216, 234


