Is there any evaluation randomness?

Question

Is there any evaluation randomness?

mingtan2 opened this issue a year ago · comments

Hi the Team,

I tried to evaluate the model commonpool_l_clip_s1b_b8k here using evaluate.py. The ImageNet acc1 is 0.57772, which is the same as 0.578 reported here, but the average result is 0.52936, which is different from 0.520 reported in Line large/CLIP score (L/14 30%) in Table 3 from your paper. Is this difference normal?

Thanks!

Samir Yitzhak Gadre · Answer 1 · Sat May 20 2023 20:10:05 GMT+0800 (China Standard Time)

Hi, thanks for the question! Our Flickr evaluation changed between the arxiv version and now. See this PR for the exact change: https://github.com/mlfoundations/datacomp/pull/12/files

We are in the process of updating the average numbers throughout the paper to be consistent with the updated evaluation code.