deepsort fcos multi-object-tracking pytorch-lightning re-identification computer-vision market-1501 mot15 object-detection

Multi-Object Tracking Using FCOS + DeepSORT

An idiot admires complexity, a genius admires simplicity...

― Terry Davis

As the term suggests, multi-object tracking's primary pursuit in computer vision problems is tracking numerous detected objects throughout a sequence of frames. This means multi-object tracking embroils two subproblems, i.e., detection and tracking. In this project, the object detection problem is tackled via COCO dataset-pretrained FCOS, an anchor-free proposal-free single-stage object detection architecture. Meanwhile, the tracking problem is solved through the DeepSORT algorithm. To track each object, DeepSORT utilizes the Kalman filter and the re-identification model. The Kalman filter is widely used to predict the states of a certain system. In this case, the states are the pixel positions (cx, cy), the bounding box's aspect ratio and height (w/h, h), and the velocity of cx, cy, w/h, and h of the objects. This project makes use of the simplified DeepSORT algorithm. The re-identification model aids in pinpointing two identical objects between frames based on their appearance. This model generates a vector descriptor associated with the objects in a frame. ImageNet-1K dataset-pretrained MobileNetV3-Small is leveraged as the backbone of the re-identification model. FAISS is set on duty in the matching process of an object's appearance and pixel location in consecutive frames. Here, the datasets used for fine-tuning the re-identification model and evaluating the tracking are Market-1501 and MOT15, respectively. The train set of the MOT15 dataset is used for testing (producing the quantitative result) and the test set of the MOT15 dataset is used for inferencing (producing the qualitative result). This project sets the object to be tracked is the person.

Experiment

Proceed to this notebook to scrutinize the re-identification and tracking.

Result

Re-identification

Quantitative Result

Quantitatively speaking, the loss and accuracy of the re-identification model on the test set are revealed in this table:

Test Metric	Score
Loss	0.258
Accuracy	95.69%

Accuracy and Loss Curve

The loss curve on the train and validation sets of the re-identification model.

The accuracy curve on the train and validation sets of the re-identification model.

Qualitative Result

The qualitative result that shows several samples of distinguishable instances and their look-alike is conveyed through these collated images:

Some instances and their other similar instances of the Market-1501 dataset.

Tracking

Quantitative Result

Here, the quantitative result of the tracking algorithm:

                IDF1   IDP   IDR  Rcll  Prcn  GT  MT  PT  ML    FP    FN IDs   FM   MOTA  MOTP IDt IDa IDm
PETS09-S2L1    56.6% 55.3% 57.8% 89.1% 85.2%  19  14   5   0   693   489  36  127  72.8% 0.238   6  29   4
KITTI-13       42.3% 36.0% 51.2% 52.8% 37.1%  42   7  25  10   681   360   8   22 -37.7% 0.259   2   7   1
KITTI-17       70.1% 67.4% 73.1% 79.6% 73.5%   9   6   3   0   196   139  10   18  49.5% 0.228   4   5   0
ADL-Rundle-8   42.8% 41.6% 44.1% 66.4% 62.7%  28  14  12   2  2681  2280  74  194  25.8% 0.269  11  61   2
ADL-Rundle-6   52.1% 56.4% 48.4% 62.2% 72.5%  24   6  16   2  1181  1891  62   94  37.4% 0.218   8  51   1
ETH-Sunnyday   68.1% 58.1% 82.3% 85.8% 60.5%  30  14  15   1  1040   263  12   46  29.2% 0.200   1  11   1
TUD-Campus     55.6% 48.2% 65.7% 70.8% 51.8%   8   3   5   0   236   105   7   14   3.1% 0.240   1   5   0
Venice-2       48.0% 44.7% 51.8% 69.6% 59.9%  26  10  16   0  3320  2172  64  140  22.2% 0.234   4  56   1
ETH-Pedcross2  53.1% 67.2% 43.9% 48.9% 74.7% 133  14  62  57  1034  3203  97  157  30.8% 0.240  22  84  13
ETH-Bahnhof    48.8% 40.8% 60.8% 71.7% 48.1% 171  74  67  30  4195  1533  56  133  -6.8% 0.236  45  42  38
TUD-Stadtmitte 75.3% 81.2% 70.2% 80.1% 92.6%  10   6   4   0    74   230  13   18  72.6% 0.224   1  12   0
OVERALL        51.5% 49.8% 53.2% 68.3% 64.0% 500 168 230 102 15331 12665 439  963  28.7% 0.237 105 363  61