mlfoundations / datacomp

DataComp: In search of the next generation of multimodal datasets

Home Page:http://datacomp.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consistency between Table 23 and Fig 3

mingtan2 opened this issue · comments

Hi Team,

I have a question about the consistency between Table 23 CLIP-L14 results and Fig 3 avg CLIP-L14 results. In Table 23, the average results of both CLIP-L14 30% and 40% have the same maximum 0.520, but, in Fig 3, the large-CLIP-L14 30% setting has a lower performance than 40%. Please correct me if I wrongly interpret the table or figure. Thanks!

In figure 3, the orange dashed line that goes from 20% to ~40% is randomly subsetting at the large scale. CLIP thresholding at the large scale is represented by the orange solid lines. So the 30% setting comes from the x-axis, and when looking at the corresponding point for CLIP-L14, the right side plot in figure 3 has a y-axis value of over 50%, which is consistent with the table.

@afang-story Thanks for your reply.

I am not sure if we are talking about the same numbers. To be more clear, what I mentioned above are:

  1. In Table 23, "CLIP L14 score top 30%" and "CLIP L14 score top 40%" have the same avg performance 0.520.
  2. In Fig 3 right, the red-solid-box_marked line shows 30% is worse than 40%. See the figure below.

20230605-093308

Thanks for catching this - the numbers in the table were correct. We have updated versions of both that will be in the next revision to also reflect the changes mentioned here #18 (comment).

@afang-story One more question, is it allowed to use non face-blurred images by setting skip_bbox_blurring=True in download_upstream.py? Thanks.

You should do face-blurring at some point in the pipeline