mlfoundations / datacomp

DataComp: In search of the next generation of multimodal datasets

Home Page:http://datacomp.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there any evaluation randomness?

mingtan2 opened this issue · comments

Hi the Team,

I tried to evaluate the model commonpool_l_clip_s1b_b8k here using evaluate.py. The ImageNet acc1 is 0.57772, which is the same as 0.578 reported here, but the average result is 0.52936, which is different from 0.520 reported in Line large/CLIP score (L/14 30%) in Table 3 from your paper. Is this difference normal?

Thanks!

Hi, thanks for the question! Our Flickr evaluation changed between the arxiv version and now. See this PR for the exact change: https://github.com/mlfoundations/datacomp/pull/12/files

We are in the process of updating the average numbers throughout the paper to be consistent with the updated evaluation code.