tzzcl / PSOL

code repository of “Rethinking the Route Towards Weakly Supervised Object Localization” in CVPR 2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about the evaluation criterion on ILSVRC

qqqqxxyy opened this issue · comments

Hello author, I have read your paper and find it very inspiring! Here is my question:
when I test the code which you use to generate gt_bbox :

def load_val_bbox(label_dict,all_imgs,gt_location):
#gt_location ='/data/zhangcl/DDT-code/ImageNet_gt'
import scipy.io as sio
gt_label = sio.loadmat(os.path.join(gt_location,'cache_groundtruth.mat'))
locs = [(x[0].split('/')[-1],x[0],x[1]) for x in all_imgs]
locs.sort()
final_bbox_dict = {}
for i in range(len(locs)):
#gt_label['rec'][:,1][0][0][0], if multilabel then get length, for final eval
final_bbox_dict[locs[i][1]] = gt_label['rec'][:,i][0][0][0][0][1][0]
return final_bbox_dic

I find that if there is more than one target object in an image, it will choose the first object's localization as the gt_bbox, for example: No.2,No.23 image in validation set of ILSVRC.
However . this is unreasonable and will decrease the accuracy rate. So, what is your evaluation criterion on ILSVRC?

Hi, Thank you for your interest in our code and paper.

Actually, the load_val_bbox is only used in the training stage, i.e., PSOL_training.py, and the evaluation metric in the training stage is just a hint (not an accurate number) for the model. In the evaluation stage, we will use different strategies to get the gt_boxes from the original XML file (Please see the PSOL_inference.py for details).
A basic assumption in current WSOL tasks is that there is only one object in the corresponding image. if there are more than one object in the image, we will calculate the maximum IoU score with any objects and take it as the final score.

Regards,
Chenlin

Your answer completely solved my question! Thank you!