replication of Faster RCNN

This is a replication excercise of Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

I took the work of bubbliiiing as a reference to consider the whole structure of the project, but the implementations are different in many ways.

Progress

Writing the test file.

Usage

put the the voc dataset under data/voc folder.
put the resnet18-5c106cde.pth file under the data/resnet folder.
run python train.py

backbone:

I wrote and trained a resnet18 by myself on CIFAR10 dataset. I got an classification accuracy of about 0.93. But I used the pytorch official pre-trained resnet18 as the backbone.

reminders

Backbone

optimizer.step() in each iteration, but scheduler.step() each epoch.
be careful about the naming of the layers in the forward method of the model. In-place naming (x=y;y=x) may push errors.
The image loaders load images as: [N,C,H,W]

anchors

First make the K anchors for each place (relative position to the points)
Then generate all the center points of the anchors
np.meshgrid(x1, x2), x1 increase in the horizontal direction(y). x2 in the vertical direction(x)
put the K anchors on the center points

RPN

Need function to select rois from region proposal
Contiguous tensors: for efficent visiting before functions like view()
torch.clamp() to crop the bounding boxes

Head

roi pooling need to adjust the boxes' scales from image to features
the classifier is trained separately from the backbone and the rpn.
add the true boxes to the rois for roi pooling, for better training of the heads

data_loader

need to make a custom data loader
the label of the boxes in the xml file are based at the up-left
When making true boxes and labels, the length of the boxes must be the same for all images in a batch. I made 32 true boxes for each image, empty boxes are labeled -1

loss function

need to map the true labels and boxes to the rpn and heads.
reg uses smooth l1 loss. less sensitive to large biases
Use torch.gather for selection of the reg boxes because the reg head output is of shape [N,n_sample, n_class*4].

juniorliu95 / replication_faster_rcnn