unsup-hard-negative-mining-mscoco

This is the repository for experiments on the MSCOCO classes mentioned in the paper Unsupervised Hard Example Mining from Videos for Improved Object Detection mentioned in Section 5(Discussion).

We used the original version of py-faster-rcnn-ft to fine-tune the VGG16 network pretrained on ImageNet dataset to convert it to a binary classifier for an MSCOCO category. Once we had the classifier as the backbone network of the Faster RCNN, we used it to label all the frames within a video for the presence of that particular MSCOCO category. Using the labelled frames, we were able to identify the frames containing hard negatives with the help of our algorithm. Finally, we fine tuned the network again after including the frames containing hard negatives and evaluated the network for improvements using held out validation and test sets.

For our research, we carried out experiments on two MSCOCO categories, Dog and Train.

Steps :-

1. Preparing a Faster RCNN object detector on an MSCOCO category

Follow the steps mentioned in the py-faster-rcnn-ft repository to prepare a VGG16 Faster RCNN network trained on an MSCOCO category of your choice.

2. Label the videos with detections

Scrape the web and download videos that are likely to contain a lot of instances of your chosen category. Helper code to download youtube videos can be found here. Once the videos have been downloaded, run the detections code to label each frame of every video with bounding boxes and confidence scores for that category. See Usage

The list of videos we used is mentioned below :-

3. Hard negative mining

The detections code outputs a txt file containing frame wise labeling and bounding box information. Use the hard negative mining code on the detections txt file to output the frames containing hard negatives and a txt file containing the bounding box information on those frames. See Usage.

4. Include the video frames containing hard negatives in the COCO dataset and fine-tune

Use the COCO annotations editor located inside utils to include the frames containing hard negatives in MSCOCO dataset. One the frames have been included in the COCO dataset, fine-tune to get an improved network. See Usage

Results :-

A summary of the results is mentioned below :-

Category	Model	Training Iterations	Training Hyperparams	Validation set AP	Test set AP
Dog	Baseline	29000	LR : 1e-3 for 10k, 1e-4 for 10k-20k, 1e-5 for 20k-29k	26.9	25.3
Dog	Flickers as HN	22000	LR : 1e-4 for 15k, 1e-5 for 15k-22k	28.1	26.4
Train	Baseline	26000	LR : 1e-3, stepsize : 10k, lr decay : 0.1	33.9	33.2
Train	Flickers as HN	24000	LR : 1e-3, stepsize : 10k, lr decay : 0.1	35.4	33.7

A few examples on the reduction in false positives achieved for the 'Dog' category are mentioned below :-

Baseline	Flickers as HN

About

Repository for experiments on MSCOCO for Unsupervised Hard Example Mining from Videos for Improved Object Detection(https://arxiv.org/abs/1808.04285)

Languages

Language:Python 95.9%Language:Shell 4.1%