This is a replication excercise of Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
I took the work of bubbliiiing as a reference to consider the whole structure of the project, but the implementations are different in many ways.
Writing the test file.
- put the the voc dataset under
data/voc folder
. - put the resnet18-5c106cde.pth file under the
data/resnet folder
. - run
python train.py
I wrote and trained a resnet18 by myself on CIFAR10 dataset. I got an classification accuracy of about 0.93. But I used the pytorch official pre-trained resnet18 as the backbone.
- optimizer.step() in each iteration, but scheduler.step() each epoch.
- be careful about the naming of the layers in the forward method of the model. In-place naming (x=y;y=x) may push errors.
- The image loaders load images as: [N,C,H,W]
- First make the K anchors for each place (relative position to the points)
- Then generate all the center points of the anchors
- np.meshgrid(x1, x2), x1 increase in the horizontal direction(y). x2 in the vertical direction(x)
- put the K anchors on the center points
- Need function to select rois from region proposal
- Contiguous tensors: for efficent visiting before functions like view()
- torch.clamp() to crop the bounding boxes
- roi pooling need to adjust the boxes' scales from image to features
- the classifier is trained separately from the backbone and the rpn.
- add the true boxes to the rois for roi pooling, for better training of the heads
- need to make a custom data loader
- the label of the boxes in the xml file are based at the up-left
- When making true boxes and labels, the length of the boxes must be the same for all images in a batch. I made 32 true boxes for each image, empty boxes are labeled -1
- need to map the true labels and boxes to the rpn and heads.
- reg uses smooth l1 loss. less sensitive to large biases
- Use torch.gather for selection of the reg boxes because the reg head output is of shape [N,n_sample, n_class*4].