Consistency between Table 23 and Fig 3
mingtan2 opened this issue · comments
Hi Team,
I have a question about the consistency between Table 23 CLIP-L14 results and Fig 3 avg CLIP-L14 results. In Table 23, the average results of both CLIP-L14 30% and 40% have the same maximum 0.520, but, in Fig 3, the large-CLIP-L14 30% setting has a lower performance than 40%. Please correct me if I wrongly interpret the table or figure. Thanks!
In figure 3, the orange dashed line that goes from 20% to ~40% is randomly subsetting at the large scale. CLIP thresholding at the large scale is represented by the orange solid lines. So the 30% setting comes from the x-axis, and when looking at the corresponding point for CLIP-L14, the right side plot in figure 3 has a y-axis value of over 50%, which is consistent with the table.
@afang-story Thanks for your reply.
I am not sure if we are talking about the same numbers. To be more clear, what I mentioned above are:
- In Table 23, "CLIP L14 score top 30%" and "CLIP L14 score top 40%" have the same avg performance 0.520.
- In Fig 3 right, the red-solid-box_marked line shows 30% is worse than 40%. See the figure below.
Thanks for catching this - the numbers in the table were correct. We have updated versions of both that will be in the next revision to also reflect the changes mentioned here #18 (comment).
@afang-story One more question, is it allowed to use non face-blurred images by setting skip_bbox_blurring=True
in download_upstream.py
? Thanks.
You should do face-blurring at some point in the pipeline