facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The detection accuracy of the R-50-FPN Faster R-CNN is lower than your report, confusing...

chenjoya opened this issue · comments

❓ Questions and Help

Hi @fmassa , thanks for your elegant implementation.
But it is confusing that the detection AP is only 32.8 when I re-train R-50-FPN Faster R-CNN, which should be 36.8 in your report:https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md

2019-04-14 07:12:12,977 maskrcnn_benchmark.inference INFO: Start evaluation on coco_2017_val dataset(5000 images).
2019-04-14 07:15:06,105 maskrcnn_benchmark.inference INFO: Total run time: 0:02:53.127008 (0.06925080318450928 s / img per device, on 2 devices)
2019-04-14 07:15:06,105 maskrcnn_benchmark.inference INFO: Model inference time: 0:02:32.530358 (0.061012143325805665 s / img per device, on 2 devices)
2019-04-14 07:15:07,906 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-04-14 07:15:07,906 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-04-14 07:15:09,584 maskrcnn_benchmark.inference INFO: Evaluating predictions
2019-04-14 07:16:17,912 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3275950734831557), ('AP50', 0.5054028517973591), ('AP75', 0.36449119818971715), ('APs', 0.1492328236066365), ('APm', 0.3439931485309256), ('APl', 0.48224050452315087)]))])

The config is not changed, but I only have 2 V100 GPUS, therefore 8 images are on each device.
Other information:

OS: Ubuntu 18.04.1 LTS
GCC version: (GCC) 5.5.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB
GPU 2: Tesla P100-PCIE-16GB
GPU 3: Tesla V100-PCIE-16GB
GPU 4: Tesla V100-PCIE-16GB

Nvidia driver version: 418.43
cuDNN version: Probably one of the following:
/usr/local/cuda-9.0/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a
/usr/local/cuda-9.2/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.2/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch                   1.0.1           py3.7_cuda9.0.176_cudnn7.4.2_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision               0.2.2                      py_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
        Pillow (5.4.1)
2019-04-13 08:24:36,398 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_faster_rcnn_R_50_FPN_1x.yaml
2019-04-13 08:24:36,398 maskrcnn_benchmark INFO:

Thanks for your attention! ^^

@chenjoya this is probably due to this part

# different behavior during training and during testing:
# during training, post_nms_top_n is over *all* the proposals combined, while
# during testing, it is over the proposals for each image
# TODO resolve this difference and make it consistent. It should be per image,
# and not per batch
if self.training:
objectness = torch.cat(
[boxlist.get_field("objectness") for boxlist in boxlists], dim=0
)
box_sizes = [len(boxlist) for boxlist in boxlists]
post_nms_top_n = min(self.fpn_post_nms_top_n, len(objectness))
_, inds_sorted = torch.topk(objectness, post_nms_top_n, dim=0, sorted=True)
inds_mask = torch.zeros_like(objectness, dtype=torch.uint8)
inds_mask[inds_sorted] = 1
inds_mask = inds_mask.split(box_sizes)
for i in range(num_images):
boxlists[i] = boxlists[i][inds_mask[i]]

In fact, the behavior is not really exactly the same if you have a batch size of 2 per GPU or a batch size of 8 per GPU. This a bug in behavior in Detectron, that has been kept in maskrcnn-benchmark for consistency.

In order to obtain the same (or similar) results as if you were running on 8 GPUs with batch size of 2 on each GPU, I believe you should increase RPN.FPN_POST_NMS_TOP_N_TRAIN by a factor of 4

post_nms_top_n = min(self.fpn_post_nms_top_n, len(objectness))
.

What is probably happening is that the output of your RPN, which is fed to the classification head afterwards, is seeing 4x less examples due to that.

Can you try changing

_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 2000
to 8000 and report back?

If this indeed works (which I expect will be the case), can you maybe send a PR improving a bit the documentation in this part?

Thanks!

Thanks for your reply. Follow your advise, I change the number of proposals after NMS to 8k:

...
_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 8000
_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TEST = 2000
# Custom rpn head, empty to use default conv or separable conv
_C.MODEL.RPN.RPN_HEAD = "SingleConvRPNHead"
...

The training will last about 24 hours. I will reply here and report the results after training.
Thank you 👍

Hi @fmassa , You are so great !!!
After change the proposals to 8k (_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 8000), the R-50-FPN Faster R-CNN model achieves 36.8 AP results:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.586
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.303
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.480
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.540
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635

Moreover, I also implement select_over_all_levels function for single image rather than whole mini-batch.
The original version:

# different behavior during training and during testing:
# during training, post_nms_top_n is over *all* the proposals combined, while
# during testing, it is over the proposals for each image
# TODO resolve this difference and make it consistent. It should be per image,
# and not per batch
if self.training:
objectness = torch.cat(
[boxlist.get_field("objectness") for boxlist in boxlists], dim=0
)
box_sizes = [len(boxlist) for boxlist in boxlists]
post_nms_top_n = min(self.fpn_post_nms_top_n, len(objectness))
_, inds_sorted = torch.topk(objectness, post_nms_top_n, dim=0, sorted=True)
inds_mask = torch.zeros_like(objectness, dtype=torch.uint8)
inds_mask[inds_sorted] = 1
inds_mask = inds_mask.split(box_sizes)
for i in range(num_images):
boxlists[i] = boxlists[i][inds_mask[i]]

New version:

        num_images = len(boxlists)
        if self.training:
            for i in range(num_images):
                boxlist = boxlists[i]
                box_size = len(boxlist)
                objectness = boxlist.get_field("objectness")
                inds_mask = torch.zeros_like(objectness, dtype=torch.uint8)
                post_nms_top_n = min(self.fpn_post_nms_top_n, box_size)
                _, inds_sorted = torch.topk(objectness, post_nms_top_n, dim=0, sorted=True)
                inds_mask[inds_sorted] = 1
                boxlists[i] = boxlists[i][inds_mask]

It also achieves 36.8 AP:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.586
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.396
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.398
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.307
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.482
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.321
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.634

Please help me check whether this implementation is correct and efficient, thank you fmassa ! ^ ^

Yes, this looks like it's right. Basically, there should not be any difference in behaviour during training and testing

Can you send a PR improving the README in the single-GPU case?

@chenjoya @fmassa
hi, how running on 2 GPUs with batch size of 2 on each GPU?
print(num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 ) got 1
my nvidia-smi information:
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K4000 Off | 00000000:01:00.0 On | N/A |
| 30% 35C P8 10W / 87W | 215MiB / 3016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:82:00.0 Off | 0 |
| 35% 73C P0 126W / 235W | 2619MiB / 11439MiB | 79% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40c Off | 00000000:83:00.0 Off | 0 |
| 23% 33C P8 23W / 235W | 11MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

thanks

commented

Hi, @fmassa , @chenjoya , so this behavior is only related to FPN, right?
And we can solve the problem in either setting these configs to
PRE_NMS_TOP_N_TRAIN: NumImgsPerGPU*1000 FPN_POST_NMS_PER_BATCH: True
or
PRE_NMS_TOP_N_TRAIN: 1000 FPN_POST_NMS_PER_BATCH: False
Right?

@rxqy exactly.
Given that this issue has already been addressed in #695, I'm closing this

@fmassa
If I make sure that I have a lager batch size when testing than training, will it be better to use over batch strategy rather than over image strategy no matter in training or testing? Because it is more robust to the variance of the number of instances at per image.
What do you think?

Hi, Do I still need to consider this settings if I use naive faster rcnn without fpn?