Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance of Declip-88M checkpoint

Hcyang-NULL opened this issue · comments

Hi, I want to reproduce the zero-shot result of DeClip-88M under ResNet50 in ImageNet-1K (whose performance is 62.5 in the table). But the evaluation result I got is 7.264 which is too low. But the result of ViT-B32 is correct. And I found a problem during loading the ResNet50 checkpoint:

size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).

I didn't change any code of the model.

Another question is that why run.sh of declip-88m-resnet50 uses clip_solver while other run.sh files use declip_solver? I use declip_solver to do the evaluation for DeClip-88M-ResNet50 by replacing the yaml file. The following figure is the results reproduced on my own compute resources:
image

Do you have any ideas? Thanks!

Hi @Hcyang-NULL , were you able to figure out the issue?
cc: @zlccccc

size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).

This problem is because the saved models come from different torch versions. You can forcibly convert logit_scale to torch.size([]) or torch.size([1]) when loading the model, which will not affect the accuracy.

Thanks for your reply!

I have tried this method before (forcibly reshape the logit_scale). It doesn't work, the performance is still 7.264. But the result of Vit is indeed correct, maybe the checkpoint of resnet50 is inconsistent with the code version? (I guess)

Excuse me, Could you tell me Where can I find the file named 'val_official.json'? @Hcyang-NULL