Improved Localization Accuracy by LocNet for Faster R-CNN

1. Introduction

This project is a Simplified Faster R-CNN improved by LocNet (Loc-Faster-RCNN for short) implementation based on Faster R-CNN by chenyuntc. It aims in:

Improve the localization accuracy of Faster R-CNN by using LocNet in the Fast R-CNN part.
The first public implementation of the original paper. The author of the paper didn't release their version.
Match the performance reported in original paper.

And it has the following features:

It can be run as pure Python code, no more build affair. (cuda code moves to cupy, Cython acceleration are optional)

This implementation is slightlly different from the original paper:

Skip pooling is not used here. Informations from conv5_3 layer(the feature map of original Faster R-CNN) is enough for my task, so skip pooling is droped in this repo. What's more, with the advent of new methods like Feature Pyramid Networks, skip pooling seems to be obsolete :)
The RPN net is exactly same as Faster R-CNN, which means only 3X3 conv is applied, rather than 3X3 and 5X5 conv nets in the original paper.
Training strategy. The original paper train the RPN and LocNet alternately, but losses of RPN and LocNet are backproped at the same time in this repo.

prob_thre :

Hyperparameters in Loc-Faster-RCNN are mostly like Faster R-CNN except for prob_thre.
prob_thre is the threshold of probability used when predicting the bounding box, if px or py is greater than prob_thre, this row or column is considered to be part of some object.
Different detection tasks may have different appropriate prob_thre to achive best performance. If most objects in the detection task are dense blocks, a higher prob_thre may achive better performance.
You can choose your own prob_thre according to your task characteristics. Use eval_prob_thre function in train.py to find out the best prob_thre for your task. Remember to set load_path variable in the utils/config.py to your best model before calling this function.

2. Performance

2.1 Pascal VOC

Training and test set of Pascal VOC 2007 are used in this repo.

2.1.1 mAP

The best prob_thre for Pascal VOC is 0.5. When using prob_thre=0.5, the performance of Loc-Faster-RCNN is listed as follows. So with dataset like Pascal VOC, Loc-Faster-RCNN can not achieve better result than Faster R-CNN. However, when apppied to dataset with lots of small and dense objects, Loc-Faster-RCNN is likely to achieve better performance.

Implementation	mAP
Loc-Faster-RCNN	0.6527
Faster R-CNN	0.7097

2.1.2 Differences between models predictions in Pascal VOC

LocNet part improves localization accuracy of Loc-Faster-RCNN by predicting the probability rather than locations. This helps when models is used to detection small objects or objects that are not so obvious. Like shown in the first 2 rows below, Loc-Faster-RCNN detected a person(row 1) and plant(row 2) even the objects are too small and not obvious.
However, LocNet part also hinders the model from identifying small parts of objects,which are more densely connected with the background rather than the main part of that object, like tail of a cat or wings of a bird ,as shown in the 3th~5th rows bellow.
What's more, if objects are overlaping or densely connected with each other in the same image, Loc-Faster-RCNN also have difficulty in drawing accurate bounding boxes around objects, as shown in the last row bellow.

Ground Truth	Loc-Faster-RCNN	Faster R-CNN

2.2 Text detection dataset

ICDAR-2011 and ICDAR-2013 are used in training and eveluating.

TBD.

3. Install dependencies

This repo is built basically on Faster R-CNN. You can check this repo to see dependencies.

4. Train

Compared with Faster R-CNN, Loc-Faster-RCNN is a little bit harder to train. If same initinal learning rate of 1e-3 is applied, the model may not converge after several epoches because px pr py would be nan. So if you encounter the same problem when using Loc-Faster-RCNN on your own dataset, maybe a smaller learning rate of 1e-4 or 1e-5 should work.

Troubleshooting

model structure
maybe : skip pooling
Maybe : conv 3X3 and conv 5X5 in RPN
High likely : Feature Pyramid Network as backbone
High likely : RoI Align rather than RoI Pooling

Acknowledgement

This work builds on many excellent works, which include:

Faster R-CNN by chenyuntc, on which this repo is built on. The best implementation of Faster R-CNN in Pytorch I've ever seen.
LocNet by the paper author.

Licensed under MIT, see the LICENSE for more detail.

Contribution Welcome.

If you encounter any problem, feel free to open an issue.

Correct me if anything is wrong or unclear.

txytju / Faster-RCNN-LocNet