A question about the evaluation criterion on ILSVRC
qqqqxxyy opened this issue · comments
Hello author, I have read your paper and find it very inspiring! Here is my question:
when I test the code which you use to generate gt_bbox :
def load_val_bbox(label_dict,all_imgs,gt_location):
#gt_location ='/data/zhangcl/DDT-code/ImageNet_gt'
import scipy.io as sio
gt_label = sio.loadmat(os.path.join(gt_location,'cache_groundtruth.mat'))
locs = [(x[0].split('/')[-1],x[0],x[1]) for x in all_imgs]
locs.sort()
final_bbox_dict = {}
for i in range(len(locs)):
#gt_label['rec'][:,1][0][0][0], if multilabel then get length, for final eval
final_bbox_dict[locs[i][1]] = gt_label['rec'][:,i][0][0][0][0][1][0]
return final_bbox_dic
I find that if there is more than one target object in an image, it will choose the first object's localization as the gt_bbox, for example: No.2,No.23 image in validation set of ILSVRC.
However . this is unreasonable and will decrease the accuracy rate. So, what is your evaluation criterion on ILSVRC?
Hi, Thank you for your interest in our code and paper.
Actually, the load_val_bbox is only used in the training stage, i.e., PSOL_training.py, and the evaluation metric in the training stage is just a hint (not an accurate number) for the model. In the evaluation stage, we will use different strategies to get the gt_boxes from the original XML file (Please see the PSOL_inference.py for details).
A basic assumption in current WSOL tasks is that there is only one object in the corresponding image. if there are more than one object in the image, we will calculate the maximum IoU score with any objects and take it as the final score.
Regards,
Chenlin
Your answer completely solved my question! Thank you!