Is there any evaluation randomness?
mingtan2 opened this issue · comments
Hi the Team,
I tried to evaluate the model commonpool_l_clip_s1b_b8k
here using evaluate.py
. The ImageNet acc1 is 0.57772, which is the same as 0.578 reported here, but the average result is 0.52936, which is different from 0.520 reported in Line large/CLIP score (L/14 30%)
in Table 3 from your paper. Is this difference normal?
Thanks!
Hi, thanks for the question! Our Flickr evaluation changed between the arxiv version and now. See this PR for the exact change: https://github.com/mlfoundations/datacomp/pull/12/files
We are in the process of updating the average numbers throughout the paper to be consistent with the updated evaluation code.