Is that right to evaluate performance on PASCAL VOC via COCO metric?

Question

Is that right to evaluate performance on PASCAL VOC via COCO metric?

machengcheng2016 opened this issue 4 years ago · comments

The title says my concern.
By looking at dataset.py, it seems that TIDE utilizes COCO metric to compute mAP on PASCAL VOC dataset.
However, I've compared the VOC official evaluation code with TIDE (which is exactly the COCO evaluation code), and the protocols for assigning tp / fp labels for predicted boxes differs. Given same scores and bboxes, VOC and COCO do output different mAPs.
I think that will be a problem. What do you think? @dbolya

Daniel Bolya · Answer 1 · Wed Feb 10 2021 23:28:44 GMT+0800 (China Standard Time)

Yeah, I agree that it would be best if we were able to use the VOC version of mAP for PASCAL VOC, but I'm not well versed in the differences so I wouldn't be able to implement that.

Out of curiosity, what's the difference in mAP that you observe? If it's not a huge difference, then I think it's fine. And TIDE is meant as a way to find places you can improve your model, not necessarily to replace the official evaluation numbers. So as long as the numbers correlate, there shouldn't be any issue.

Bruno Ma · Answer 2 · Mon Feb 15 2021 01:07:07 GMT+0800 (China Standard Time)

Well, I got a ~10% higher mAP with PASCAL VOC metric than COCO metric on VOC2007 test. Some people say that PASCAL VOC greedily finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false. While in COCO the search continues if the current best match already matched.

Anyways, I agree with you that the role of TIDE is to find how one can improve the OD model, regardless of ways of evaluating mAP. After all, COCO metric is a reasonable metric.

Liu Feng · Answer 3 · Mon May 17 2021 18:32:24 GMT+0800 (China Standard Time)

Well, I got a ~10% higher mAP with PASCAL VOC metric than COCO metric on VOC2007 test. Some people say that PASCAL VOC greedily finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false. While in COCO the search continues if the current best match already matched.

Anyways, I agree with you that the role of TIDE is to find how one can improve the OD model, regardless of ways of evaluating mAP. After all, COCO metric is a reasonable metric.

Hi，@machengcheng2016， could you share the code to convert pascal detection results to coco json style? My conversion get a much lower mAP in tide than mmdetection.

Liu Feng · Answer 4 · Mon May 17 2021 18:37:42 GMT+0800 (China Standard Time)

Details of my conversion are in this issue

Liu Feng · Answer 5 · Thu May 27 2021 16:55:34 GMT+0800 (China Standard Time)

I've solved it.