Multitarget (multiple objects) tracker

1. Objects detector can be created with function CreateDetector with different values of the detectorType:

1.1. Based on background substraction: built-in Vibe (tracking::Motion_VIBE), SuBSENSE (tracking::Motion_SuBSENSE) and LOBSTER (tracking::Motion_LOBSTER); MOG2 (tracking::Motion_MOG2) from opencv; MOG (tracking::Motion_MOG), GMG (tracking::Motion_GMG) and CNT (tracking::Motion_CNT) from opencv_contrib. For foreground segmentation used contours from OpenCV with result as cv::RotatedRect

1.2. Haar face detector from OpenCV (tracking::Face_HAAR)

1.3. HOG pedestrian detector from OpenCV (tracking::Pedestrian_HOG) and C4 pedestrian detector from sturkmen72 (tracking::Pedestrian_C4)

1.4. MobileNet SSD detector (tracking::SSD_MobileNet) with opencv_dnn inference and pretrained models from chuanqi305

1.5. YOLO detector (tracking::Yolo_OCV) with opencv_dnn inference and pretrained models from pjreddie

1.6. YOLO detector (tracking::Yolo_Darknet) with darknet inference from AlexeyAB and pretrained models from pjreddie

1.7. You can to use custom detector with bounding or rotated rectangle as output.

2. Matching or solve an assignment problem:

2.1. Hungrian algorithm (tracking::MatchHungrian) with cubic time O(N^3) where N is objects count

2.2. Algorithm based on weighted bipartite graphs (tracking::MatchBipart) from rdmpage with time O(M * N^2) where N is objects count and M is connections count between detections on frame and tracking objects. It can be faster than Hungrian algorithm

2.3. Distance from detections and objects: euclidean distance in pixels between centers (tracking::DistCenters), euclidean distance in pixels between rectangles (tracking::DistRects), Jaccard or IoU distance from 0 to 1 (tracking::DistJaccard)

3. Smoothing trajectories and predict missed objects:

3.1. Linear Kalman filter from OpenCV (tracking::KalmanLinear)

3.2. Unscented Kalman filter from OpenCV (tracking::KalmanUnscented)

3.3. Kalman goal is only coordinates (tracking::FilterCenter) or coordinates and size (tracking::FilterRect)

3.4. Simple Abandoned detector

3.5. Line intersection counting

4. Advanced visual search for objects if they have not been detected:

4.1. No search (tracking::TrackNone)

4.2. Built-in DAT (tracking::TrackDAT) from foolwood, STAPLE (tracking::TrackSTAPLE) from xuduo35 or LDES (tracking::TrackLDES) from yfji; KCF (tracking::TrackSTAPLE), MIL (tracking::TrackSTAPLE), MedianFlow (tracking::TrackSTAPLE), GOTURN (tracking::TrackSTAPLE), MOSSE (tracking::TrackSTAPLE) or CSRT (tracking::TrackSTAPLE) from opencv_contrib

With this option the tracking can work match slower but more accuracy.

5. Pipeline

5.1. Syncronous pipeline - SyncProcess:

get frame from capture device;
decoding;
objects detection (1);
tracking (2-4);
show result.

This pipeline is good if all algorithms are fast and works faster than time between two frames (40 ms for device with 25 fps). Or it can be used if we have only 1 core for all (no parallelization).

5.2. Pipeline with 2 threads - AsyncProcess:

1th thread takes frame t and makes capture, decoding and objects detection;
2th thread takes frame t-1, results from first thread and makes tracking and results presentation (this is the Main read).

So we have a latency on 1 frame but on two free CPU cores we can increase performance on 2 times.

5.3. Fully acynchronous pipeline can be used if the objects detector works with low fps and we have a free 2 CPU cores. In this case we use 4 threads:

1th main thread is not busy and used for GUI and result presentation;
2th thread makes capture and decoding, puts frames in threadsafe queue;
3th thread is used for objects detection on the newest frame from the queue;
4th thread is used for objects tracking: waits the frame with detection from 3th tread and used advanced visual search (4) in intermediate frames from queue until it ges a frame with detections.

This pipeline can used with slow but accuracy DNN and track objects in intermediate frame in realtime without latency.

Demo Videos

MobileNet SSD and tracking for low resolution and low quality videos from car DVR:

Mouse tracking:

Motion Detection and tracking:

Multiple Faces tracking:

Simple Abandoned detector:

Tested Platforms

Ubuntu Linux 18.04 with x86 processors
Ubuntu Linux 18.04 with Nvidia Jetson Nano (YOLO + darknet on GPU works!)
Windows 10 (x64 and x32 builds)

Build

Download project sources
Install CMake
Install OpenCV (https://github.com/opencv/opencv) and OpenCV contrib (https://github.com/opencv/opencv_contrib) repositories
Configure project CmakeLists.txt, set OpenCV_DIR.
If opencv_contrib don't installed then disable options USE_OCV_BGFG=OFF, USE_OCV_KCF=OFF and USE_OCV_UKF=OFF
If you want to use native darknet YOLO detector with CUDA + cuDNN then set BUILD_YOLO_LIB=ON
For building example with low fps detector (now native darknet YOLO detector) and Tracker worked on each frame: BUILD_ASYNC_DETECTOR=ON
For building example with line crossing detection (cars counting): BUILD_CARS_COUNTING=ON
Go to the build directory and run make

Usage:

       Usage:
         ./MultitargetTracker <path to movie file> [--example]=<number of example 0..6> [--start_frame]=<start a video from this position> [--end_frame]=<play a video to this position> [--end_delay]=<delay in milliseconds after video ending> [--out]=<name of result video file> [--show_logs]=<show logs> [--gpu]=<use OpenCL> [--async]=<async pipeline>
         ./MultitargetTracker ../data/atrium.avi -e=1 -o=../data/atrium_motion.avi
       Press:
       * 'm' key for change mode: play|pause. When video is paused you can press any key for get next frame.
       * Press Esc to exit from video

       Params: 
       1. Movie file, for example ../data/atrium.avi
       2. [Optional] Number of example: 0 - MouseTracking, 1 - MotionDetector, 2 - FaceDetector, 3 - PedestrianDetector, 4 - MobileNet SSD detector, 5 - YOLO OpenCV detector, 6 - Yolo Darknet detector
          -e=0 or --example=1
       3. [Optional] Frame number to start a video from this position
          -sf=0 or --start_frame==1500
       4. [Optional] Play a video to this position (if 0 then played to the end of file)
          -ef=0 or --end_frame==200
       5. [Optional] Delay in milliseconds after video ending
          -ed=0 or --end_delay=1000
       6. [Optional] Name of result video file
          -o=out.avi or --out=result.mp4
       7. [Optional] Show Trackers logs in terminal
          -sl=1 or --show_logs=0
       8. [Optional] Use built-in OpenCL
          -g=1 or --gpu=0
       9. [Optional] Use 2 threads for processing pipeline
          -a=1 or --async=0

Thirdparty libraries

OpenCV (and contrib): https://github.com/opencv/opencv and https://github.com/opencv/opencv_contrib
Vibe: https://github.com/BelBES/VIBE
SuBSENSE and LOBSTER: https://github.com/ethereon/subsense
GTL: https://github.com/rdmpage/graph-template-library
MWBM: https://github.com/rdmpage/maximum-weighted-bipartite-matching
Pedestrians detector: https://github.com/sturkmen72/C4-Real-time-pedestrian-detection
Non Maximum Suppression: https://github.com/Nuzhny007/Non-Maximum-Suppression
MobileNet SSD models: https://github.com/chuanqi305/MobileNet-SSD
YOLO models: https://pjreddie.com/darknet/yolo/
Darknet inference: https://github.com/AlexeyAB/darknet
GOTURN models: https://github.com/opencv/opencv_extra/tree/c4219d5eb3105ed8e634278fad312a1a8d2c182d/testdata/tracking
DAT tracker: https://github.com/foolwood/DAT
STAPLE tracker: https://github.com/xuduo35/STAPLE
LDES tracker: https://github.com/yfji/LDESCpp

License

GNU GPLv3: http://www.gnu.org/licenses/gpl-3.0.txt

zbyuan / Multitarget-tracker