mot yolov4 yolov4-darknet yolov4-deepsort yolov4-tiny multiple-object-tracking tracking-by-detection

People Tracking in crowded location using Multiple Object Tracking (MOT)

This project is focused on Multiple Object Tracking (MOT) which is an important topic in the field of computer vision. It can be used in various advanced technology applications such as autonomous driving systems, traffic monitoring, and analyzing people in different fields such as sports players. The project applied an MOT algorithm for tracking people in crowded areas with the help of a detection system. The model uses the tracking-by-detection (DBT) strategy and uses the YOLOv4 algorithm for detection and the DeepSORT algorithm for tracking. A Kalman filter is used to predict the locations of existing tracks in the current frame from the previous frame.The YOLOv4 detection algorithm is used for evaluation on the MOT20 dataset and achieved a 97.7% mAP value for the detection system.

Getting Started

The methods employed for object detection and tracking involved various techniques, including different conducted methods and experiments on training datasets. The initial step focused on introducing the dataset section, outlining how the dataset was organized for training the detection system. Subsequently, implementations, evaluations of metrics, and discussions were conducted to assess the effectiveness of the employed methods.

MOTChallenge, offering meticulously annotated datasets and well-defined metrics for evaluating tracking algorithms and pedestrian detectors, served as chosen dataset for training my model. Trained a YOLOv4 and YOLOv4-tiny model with the Darknet-53 backbone specifically on MOT20 training sequences. Emphasized aligning the model's data format with MOT20 for effective learning. The labeling process highlighted key features crucial for pattern analysis, enhancing target prediction.

The ground truth file of MOT20 benchmark has the file format as shown below.

<frame>, <id>, <bbleft>, <bbtop>, <width>, <height>, <conf>, <x>, <y>, <z>

Field	Description
`<frame>`	frame number
`<id>`	Identity number
`<bbleft>`	Bounding box left
`<bbtop>`	Bounding box right
`<width>`	Bounding box width
`<height>`	Bounding box heightd
`<conf>`	Confidence score
`<x>`	class
`<y>`	visibility
`<z>`

The YOLO format consists of object class, object coordinates,height, and width as the following format

<object-class>, <x>, <y>, <width>, <height>

Joseph Redmon (https://github.com/pjreddie) initially authored Darknet; however, his "pjreddie" repository has been abandoned for several years and is not recommended for use, lacking substantial updates since 2016. Instead, our implementation relies on the current version of Darknet, modified by AlexAB (https://github.com/AlexeyAB/darknet) . The following outlines the procedures utilized in the training process for both YOLOv4 and YOLOv4-tiny models

Prerequisites

Built With

Google colab -
Google Colab, short for Colaboratory, is a free, cloud-based service provided by Google that allows you to write and execute Python code in a Jupyter Notebook-like environment.
It's a popular choice for data science, machine learning, and deep learning tasks.
Colab comes with many pre-installed libraries commonly used in data science and machine learning, such as TensorFlow, PyTorch, Keras, OpenCV, and more
Google Colab provides free access to Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), which can significantly accelerate computations, especially for machine learning tasks.

YOLO4 training