Consistency between Table 23 and Fig 3

Question

Consistency between Table 23 and Fig 3

mingtan2 opened this issue a year ago · comments

Hi Team,

I have a question about the consistency between Table 23 CLIP-L14 results and Fig 3 avg CLIP-L14 results. In Table 23, the average results of both CLIP-L14 30% and 40% have the same maximum 0.520, but, in Fig 3, the large-CLIP-L14 30% setting has a lower performance than 40%. Please correct me if I wrongly interpret the table or figure. Thanks!

Alex Fang · Answer 1 · Mon Jun 05 2023 07:55:43 GMT+0800 (China Standard Time)

In figure 3, the orange dashed line that goes from 20% to ~40% is randomly subsetting at the large scale. CLIP thresholding at the large scale is represented by the orange solid lines. So the 30% setting comes from the x-axis, and when looking at the corresponding point for CLIP-L14, the right side plot in figure 3 has a y-axis value of over 50%, which is consistent with the table.

mingtan2 · Answer 2 · Tue Jun 06 2023 00:34:17 GMT+0800 (China Standard Time)

@afang-story Thanks for your reply.

I am not sure if we are talking about the same numbers. To be more clear, what I mentioned above are:

In Table 23, "CLIP L14 score top 30%" and "CLIP L14 score top 40%" have the same avg performance 0.520.
In Fig 3 right, the red-solid-box_marked line shows 30% is worse than 40%. See the figure below.

Alex Fang · Answer 3 · Tue Jun 06 2023 02:17:05 GMT+0800 (China Standard Time)

Thanks for catching this - the numbers in the table were correct. We have updated versions of both that will be in the next revision to also reflect the changes mentioned here #18 (comment).

mingtan2 · Answer 4 · Thu Jun 15 2023 06:50:18 GMT+0800 (China Standard Time)

@afang-story One more question, is it allowed to use non face-blurred images by setting skip_bbox_blurring=True in download_upstream.py? Thanks.

Alex Fang · Answer 5 · Fri Jun 16 2023 14:26:09 GMT+0800 (China Standard Time)

You should do face-blurring at some point in the pipeline