A question about the evaluation criterion on ILSVRC

Question

A question about the evaluation criterion on ILSVRC

qqqqxxyy opened this issue 4 years ago · comments

Hello author, I have read your paper and find it very inspiring! Here is my question:
when I test the code which you use to generate gt_bbox :

def load_val_bbox(label_dict,all_imgs,gt_location):
#gt_location ='/data/zhangcl/DDT-code/ImageNet_gt'
import scipy.io as sio
gt_label = sio.loadmat(os.path.join(gt_location,'cache_groundtruth.mat'))
locs = [(x[0].split('/')[-1],x[0],x[1]) for x in all_imgs]
locs.sort()
final_bbox_dict = {}
for i in range(len(locs)):
#gt_label['rec'][:,1][0][0][0], if multilabel then get length, for final eval
final_bbox_dict[locs[i][1]] = gt_label['rec'][:,i][0][0][0][0][1][0]
return final_bbox_dic

I find that if there is more than one target object in an image, it will choose the first object's localization as the gt_bbox, for example: No.2,No.23 image in validation set of ILSVRC.
However . this is unreasonable and will decrease the accuracy rate. So, what is your evaluation criterion on ILSVRC?

Chen-Lin Zhang · Answer 1 · Wed Feb 24 2021 18:09:46 GMT+0800 (China Standard Time)

Hi, Thank you for your interest in our code and paper.

Actually, the load_val_bbox is only used in the training stage, i.e., PSOL_training.py, and the evaluation metric in the training stage is just a hint (not an accurate number) for the model. In the evaluation stage, we will use different strategies to get the gt_boxes from the original XML file (Please see the PSOL_inference.py for details).
A basic assumption in current WSOL tasks is that there is only one object in the corresponding image. if there are more than one object in the image, we will calculate the maximum IoU score with any objects and take it as the final score.

Regards,
Chenlin

matrixdl · Answer 2 · Wed Feb 24 2021 19:59:21 GMT+0800 (China Standard Time)

Your answer completely solved my question! Thank you!