UAVDetectionTrackingBenchmark

This repository contains the code, configuration files and dataset statistics used for the paper
Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark submitted to IROS 2021.

The repository is organized as follows:

datasets (dir): Contains the COCO annotation files used for each dataset
detection (dir): This directory contains the configuration files for detection, the log files and some scripts used for setting up the detection dataset.

Prerequisites

PyTorch 1.7.1
OpenCV 4.5
MMCV 1.2.4
MMDet 2.8.0
MMTrack 0.5.1

Installation

Detection and tracking was carried using the OpenMMLab frameworks for each task. In this section, we give a summary on how to setup the frameworks for each task

Detection

Install MMDetection using the Getting Started guide.
Create a directory under the configs folder (e.g., configs/uavbenchmark) and copy the config files.
Create the data directory in the root folder of the project and create the dataset folders (you can also use symbolic links):
- anti-uav
- anti-uav/images
- drone-vs-bird
- drone-vs-bird/images
- mav-vid
- mav-vid/images
Under each dataset folder copy the annotation files to its corresponding folder.
Copy all the images for each dataset to <dataset-folder>/images (see below for each dataset details).
Create a checkpoints folder under the root of the project and download the weight files (see below) to this folder.
Run the following script: python tools/test.py <PATH TO CONFIG FILE> <PATH TO WEIGHT FILE>

Datasets

Three datasets were in our benchmark. An example of each dataset is shown next, with (a) MAV-VID, (b) Drone-vs-Bird, (c) Anti-UAV Visual and (d) Anti-UAV Infrared.

Multirotor Aerial Vehicle VID (MAV-VID)

This dataset consists on videos at different setups of single UAV. It contains videos captured from other drones, ground based surveillance cameras and handheld mobile devices. It can be downloaded in its kaggle site.

This dataset is composed of images with YOLO annotations divided in two directories: train and val. In order to use this dataset in this benchmark kit, create the COCO annotation files for each data partition, using the convert_mav_vid_to_coco.py, rename them to train.json and val.json and move them to the data/mav-vid directory created in the installation steps. Then, copy all images of both partitions to the data/mav-vid/images directory.

Drone-vs-Bird

As part of the International Workshop on Small-Drone Surveillance, Detection and Counteraction techniques of IEEE AVSS 2020, the main goal of this challenge is to reduce the high false positive rates that vision-based methods usually suffer. This dataset comprises videos of UAV captured at long distances and often surrounded by small objects, such as birds.

The videos can be downloaded upon request and the annotations can be downloaded via their GitHub site. The annotations follow a custom format, where a a .txt file is given for each video. Each annotation file has a line for each video frame and the annotation is given in the format <Frame number> <Number of Objects> <x> <y> <width> <height> [<x> <y> ...]. To use this dataset in this benchmark, first you need to convert the video to images via video_to_images.py and then you need to create the COCO annotations using the convert_drone_vs_bird_to_coco.py script. Just as in the MAV-VID dataset, copy the images to the data/drone-vs-bird/images directory and the annotations to data/drone-vs-bird.

Anti-UAV

This multi-modal dataset comprises fully-annotated RGB and IR unaligned videos. Anti-UAV dataset is intended to provide a real-case benchmark for evaluating object tracker algorithms in the context of UAV. It contains recordings of 6 UAV models flying at different lightning and background conditions. This dataset can be downloaded in their website.

This dataset is also comprised of videos and custom annotations. Once downlaoded and extracted, the videos are organised in folders containing the RGB and IR versions, with their corresponding JSON annotations. To convert this dataset to images and COCO annotations, use the convert_anti_uav_to_coco.py script and copy the images generated annotations to data/anti-uav and the images to data/anti-uav/images. The images folder will contain the images for both modalities and the full (both modalities), RGB and IR annotations will be generated.

Dataset Statistics

Dataset object size

Dataset	Size	Average Object Size
MAV-VID	Training: 53 videos (29,500 images) Validation: 11 videos (10,732 images)	215 x 128 pxs (3.28% of image size)
Drone-vs-Bird	Training: 61 videos (85,904 images) Validation: 16 videos (18,856 images)	34 x 23 pxs (0.10% of image size)
Anti-UAV	Training: 60 videos (149,478 images) Validation: 40 videos (37,016 images)	RGB: 125 x 59 pxs (0.40% image size) IR: 52 x 29 pxs (0.50% image size)

Location, size and image composition statistics

Detection Results

Four detection architectures were used for our analysis: Faster RCNN, SSD512, YOLOv3 and DETR. For the implementation details, refer to our paper. The results are as follows:

Dataset	Model	AP	AP_0.5	AP_0.75	AP_S	AP_M	AP_L	AR	AR_S	AR_M	AR_L
MAV-VID	Faster RCNN log weights	0.592	0.978	0.672	0.154	0.541	0.656	0.659	0.369	0.621	0.721
	SSD512 log weights	0.535	0.967	0.536	0.083	0.499	0.587	0.612	0.377	0.578	0.666
	YOLOv3 log weights	0.537	0.963	0.542	0.066	0.471	0.636	0.612	0.208	0.559	0.696
	DETR log weights	0.545	0.971	0.560	0.044	0.490	0.612	0.692	0.346	0.661	0.742
Drone-vs-Bird	Faster RCNN log weights	0.283	0.632	0.197	0.218	0.473	0.506	0.356	0.298	0.546	0.512
	SSD512 log weights	0.629	0.134	0.199	0.422	0.052	0.379	0.327	0.549	0.556
	YOLOv3 log weights	0.210	0.546	0.105	0.158	0.395	0.356	0.302	0.238	0.512	0.637
	DETR log weights	0.251	0.667	0.123	0.190	0.444	0.533	0.473	0.425	0.631	0.550
Anti-UAV-Full	Faster RCNN log weights	0.612	0.974	0.701	0.517	0.619	0.737	0.666	0.601	0.670	0.778
	SSD512 log weights	0.613	0.982	0.697	0.527	0.619	0.712	0.678	0.616	0.682	0.780
	YOLOv3 log weights	0.604	0.977	0.676	0.529	0.619	0.708	0.667	0.618	0.668	0.760
	DETR log weights	0.586	0.977	0.648	0.509	0.589	0.692	0.649	0.598	0.649	0.752
Anti-UAV-RGB	Faster RCNN log weights	0.642	0.982	0.770	0.134	0.615	0.718	0.694	0.135	0.677	0.760
	SSD512 log weights	0.627	0.979	0.747	0.124	0.593	0.718	0.703	0.156	0.682	0.785
	YOLOv3 log weights	0.617	0.986	0.717	0.143	0.595	0.702	0.684	0.181	0.664	0.758
	DETR log weights	0.628	0.978	0.740	0.129	0.590	0.734	0.700	0.144	0.675	0.794
Anti-UAV-IR	Faster RCNN log weights	0.581	0.977	0.641	0.523	0.623	-	0.636	0.602	0.663	-
	SSD512 log weights	0.590	0.975	0.639	0.518	0.636	-	0.649	0.609	0.681	-
	YOLOv3 log weights	0.591	0.976	0.643	0.533	0.638	-	0.651	0.620	0.675	-
	DETR log weights	0.599	0.980	0.655	0.525	0.642	-	0.671	0.633	0.701	-

Citation

@article{uavbenchmark,
  title={Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark},
  author={Isaac-Medina, Brian K. S. and Poyser, Matt and Organisciak, Daniel and Willcocks, Chris G. and Breckon, Toby P. and Shum, Hubert P. H.},
  journal = {arXiv},
  year={2021}
}

About

Languages

Language:Python 100.0%