The detection accuracy of the R-50-FPN Faster R-CNN is lower than your report, confusing...
chenjoya opened this issue · comments
❓ Questions and Help
Hi @fmassa , thanks for your elegant implementation.
But it is confusing that the detection AP is only 32.8 when I re-train R-50-FPN Faster R-CNN, which should be 36.8 in your report:https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md
2019-04-14 07:12:12,977 maskrcnn_benchmark.inference INFO: Start evaluation on coco_2017_val dataset(5000 images).
2019-04-14 07:15:06,105 maskrcnn_benchmark.inference INFO: Total run time: 0:02:53.127008 (0.06925080318450928 s / img per device, on 2 devices)
2019-04-14 07:15:06,105 maskrcnn_benchmark.inference INFO: Model inference time: 0:02:32.530358 (0.061012143325805665 s / img per device, on 2 devices)
2019-04-14 07:15:07,906 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-04-14 07:15:07,906 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-04-14 07:15:09,584 maskrcnn_benchmark.inference INFO: Evaluating predictions
2019-04-14 07:16:17,912 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3275950734831557), ('AP50', 0.5054028517973591), ('AP75', 0.36449119818971715), ('APs', 0.1492328236066365), ('APm', 0.3439931485309256), ('APl', 0.48224050452315087)]))])
The config is not changed, but I only have 2 V100 GPUS, therefore 8 images are on each device.
Other information:
OS: Ubuntu 18.04.1 LTS
GCC version: (GCC) 5.5.0
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB
GPU 2: Tesla P100-PCIE-16GB
GPU 3: Tesla V100-PCIE-16GB
GPU 4: Tesla V100-PCIE-16GB
Nvidia driver version: 418.43
cuDNN version: Probably one of the following:
/usr/local/cuda-9.0/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a
/usr/local/cuda-9.2/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.2/lib64/libcudnn_static.a
Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch 1.0.1 py3.7_cuda9.0.176_cudnn7.4.2_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision 0.2.2 py_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
Pillow (5.4.1)
2019-04-13 08:24:36,398 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_faster_rcnn_R_50_FPN_1x.yaml
2019-04-13 08:24:36,398 maskrcnn_benchmark INFO:
Thanks for your attention! ^^
@chenjoya this is probably due to this part
maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py
Lines 154 to 170 in c5c4d52
In fact, the behavior is not really exactly the same if you have a batch size of 2 per GPU or a batch size of 8 per GPU. This a bug in behavior in Detectron, that has been kept in maskrcnn-benchmark
for consistency.
In order to obtain the same (or similar) results as if you were running on 8 GPUs with batch size of 2 on each GPU, I believe you should increase RPN.FPN_POST_NMS_TOP_N_TRAIN
by a factor of 4
What is probably happening is that the output of your RPN, which is fed to the classification head afterwards, is seeing 4x less examples due to that.
Can you try changing
to 8000 and report back?If this indeed works (which I expect will be the case), can you maybe send a PR improving a bit the documentation in this part?
Thanks!
Thanks for your reply. Follow your advise, I change the number of proposals after NMS to 8k:
...
_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 8000
_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TEST = 2000
# Custom rpn head, empty to use default conv or separable conv
_C.MODEL.RPN.RPN_HEAD = "SingleConvRPNHead"
...
The training will last about 24 hours. I will reply here and report the results after training.
Thank you 👍
Hi @fmassa , You are so great !!!
After change the proposals to 8k (_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 8000
), the R-50-FPN Faster R-CNN model achieves 36.8 AP results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.586
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.397
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.303
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.480
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.504
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.313
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.540
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
Moreover, I also implement select_over_all_levels
function for single image rather than whole mini-batch.
The original version:
maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py
Lines 154 to 170 in c5c4d52
New version:
num_images = len(boxlists)
if self.training:
for i in range(num_images):
boxlist = boxlists[i]
box_size = len(boxlist)
objectness = boxlist.get_field("objectness")
inds_mask = torch.zeros_like(objectness, dtype=torch.uint8)
post_nms_top_n = min(self.fpn_post_nms_top_n, box_size)
_, inds_sorted = torch.topk(objectness, post_nms_top_n, dim=0, sorted=True)
inds_mask[inds_sorted] = 1
boxlists[i] = boxlists[i][inds_mask]
It also achieves 36.8 AP:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.586
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.396
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.398
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.307
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.482
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.506
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.321
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.634
Please help me check whether this implementation is correct and efficient, thank you fmassa ! ^ ^
Yes, this looks like it's right. Basically, there should not be any difference in behaviour during training and testing
Can you send a PR improving the README in the single-GPU case?
@chenjoya @fmassa
hi, how running on 2 GPUs with batch size of 2 on each GPU?
print(num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 ) got 1
my nvidia-smi information:
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K4000 Off | 00000000:01:00.0 On | N/A |
| 30% 35C P8 10W / 87W | 215MiB / 3016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:82:00.0 Off | 0 |
| 35% 73C P0 126W / 235W | 2619MiB / 11439MiB | 79% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40c Off | 00000000:83:00.0 Off | 0 |
| 23% 33C P8 23W / 235W | 11MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
thanks
@fmassa
If I make sure that I have a lager batch size when testing than training, will it be better to use over batch strategy rather than over image strategy no matter in training or testing? Because it is more robust to the variance of the number of instances at per image.
What do you think?
Hi, Do I still need to consider this settings if I use naive faster rcnn without fpn?