Doubt about the inconsistency of RPN cls score computation during inference and training

Question

Doubt about the inconsistency of RPN cls score computation during inference and training

chenchr opened this issue 5 years ago · comments

Hi peiliang, I just read the code of mono branch, in here seem the softmax is apply on tensor with shape b * 2 * 3h w, and we get the probability, then it is permuted with shape b * h * w * 6 , then reshape to b * h * w3 * 2, then viewed as b * hw3 * 2, and the last dimension is considered as the foreground-background probability, which is inconsistent with original shape you apply softmax on.

chenchr · Answer 1 · Thu Nov 21 2019 14:26:58 GMT+0800 (China Standard Time)

jwyang/fpn.pytorch#6

xmyqsh · Answer 2 · Thu Nov 21 2019 18:08:49 GMT+0800 (China Standard Time)

@chenchr
First solution:

            rpn_cls_score_reshape = rpn_cls_score.view(
                                                       rpn_cls_score.size(0),
                                                       2,
                                                       rpn_cls_score.size(1) / 2,
                                                       rpn_cls_score.size(2),
                                                       rpn_cls_score.size(3)).permute(0,1,3,4,2).contiguous()
            rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)
            ##rpn_cls_prob = self.reshape(rpn_cls_prob_reshape, self.nc_score_out)

            # get rpn offsets to the anchor boxes
            rpn_bbox_pred_left_right = self.RPN_bbox_pred_left_right(rpn_conv1)

            rpn_shapes.append([rpn_cls_score.size()[2], rpn_cls_score.size()[3]])
            rpn_cls_scores.append(rpn_cls_score.permute(0, 2, 3, 4, 1).contiguous().view(batch_size, -1, 2))
            rpn_cls_probs.append(rpn_cls_prob.permute(0, 2, 3, 4, 1).contiguous().view(batch_size, -1, 2))

Best solution:
change

# define bg/fg classifcation score layer
        self.nc_score_out = 1 * len(self.anchor_ratios) * 2 # 2(bg/fg) * 3 (anchor ratios) * 1 (anchor scale)
        self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)

to

# define bg/fg classifcation score layer
        self.nc_score_out = 1 * len(self.anchor_ratios) # 1(sigmoid(bg/fg)) * 3 (anchor ratios) * 1 (anchor scale)
        self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)

and change softmax to sigmoid, change cross_entropy_loss to binary_cross_entropy_loss and replace the related shape process accordingly.
Shape process like this:

            rpn_cls_score_reshape = rpn_cls_score.view(rpn_cls_score.size(0),-1)
            rpn_cls_prob_reshape = F.sigmoid(rpn_cls_score_reshape)

            # get rpn offsets to the anchor boxes
            rpn_bbox_pred_left_right = self.RPN_bbox_pred_left_right(rpn_conv1)

            rpn_shapes.append([rpn_cls_score.size()[2], rpn_cls_score.size()[3]])
            rpn_cls_scores.append(rpn_cls_score_reshape)
            rpn_cls_probs.append(rpn_cls_prob_reshape)

chenchr · Answer 3 · Fri Nov 22 2019 17:45:11 GMT+0800 (China Standard Time)

@xmyqsh Thanks. Do you have any experiments results using different form of rpn_cls_score computation ? It seems that whether changing the code or not does not affect the performance, which is weird.