Important changes made to Dassl's transforms.py

Question

Important changes made to Dassl's transforms.py

KaiyangZhou opened this issue 3 years ago · comments

So, you might find OpenAI's code produces around 59% accuracy for zero-shot CLIP (vision_model=RN50) on ImageNet with prompt ensembling, but CoOp's code gives only 57.81% for the same model (see Table 7 in the paper).

This difference is caused by using different transforms: OpenAI's code applies Resize(224) to an image while CoOp's code (the previous version) uses Resize((224, 224)). More specifically, the former keeps the image aspect ratio while the latter doesn't. To allow the results produced by CoOp's code to be comparable to OpenAI's code, we have made our transforms consistent with theirs. So the transforms in the config files have now been changed from ["random_flip", "random_translation", "center_crop", "normalize"] to ["random_resized_crop", "random_flip", "normalize"].

If you are using our Dassl-based CoOp code, please update the code to the latest version. If you want to use your own code, you can simple copy CoOp's model code (i.e. CustomCLIP) and do the comparison on the same ground with whatever pipelines you are using.

For your reference, we have rerun CoOp using the new config files and put below the comparison of Table 7's results.

Previous version

Method	RN50	Rn101	ViT-B/32	ViT-B/16
Prompt engineering	55.41	58.72	59.88	64.71
Prompt ensembling	57.81	60.49	62.01	67.31
CoOp	60.46	64.39	64.92	70.13

Current version

Method	RN50	Rn101	ViT-B/32	ViT-B/16
Prompt engineering	58.18	61.26	62.05	66.73
Prompt ensembling	60.41	62.54	63.71	68.74
CoOp	62.95	66.60	66.85	71.92