xmed-lab / CLIPN

ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about baseline results in Tab 2

Z-ZHHH opened this issue · comments

Appreciate your impressive work.
In the table 2 of the main paper, is the MSP, MaxLogit results reproduced on CLIP or CLIPN? I test the MaxLogit on CLIP (VitB-32) on CIFAR100 (id) and CIFAR10 (ood), but only get 74.8%AUROC.

Sorry for the late reply. The results of MSP and MaxLogit in Table 2 are produced by the original CLIP model. You can check the new-uploaded "./handcrafted/src/zero_shot_infer.py" for hand-crafted CLIPN. It can produce the proper results on CIFAR.

Thanks for your reply. I find different implementation may superly affect the OOD detection performance, while the ID classification performance is similar.

It seems that in the file./handcrafted/src/zero_shot_infer.py These lines, the model is the CLIPN model as CLIP do not have ckpt with epoch num.

That’s because there is a hyper parameter temperature. I just use the final learned one, 100 instead of manually finding best one on test set for different methods and datasets. Besides, whether using L2 normalization can also change the performance. Unfortunately, I failed to find a consistent conclusion of determining the above two factors on different methods and datasets. As a result, a general and fair way is to follow the original operation of the CLIP.

The image encoder and text encoder of CLIPN is the same as CLIP. We freeze them when training no text encoder. That means you can find the same CLIP model from CLIPN models at different epochs.

Thanks for your quick and detailed reply!
I implemented the baseline methods (MSP, MaxLogit, Energy) with MCM repo on CIFAR-100 and ImageNet, and the results were kind of different from your reported results on CIFAR-100.
Specifically speaking, these are the difference (avg. on the OOD datasets):
our implementation with MCM repo

Vit-B32 MCM MaxLogit MSP Energy
CIFAR-100 79.5 72.5 77.3
ImageNet-1k 84.9 78.9 82.6

reported results in CLIPN

Vit-B32 CLIPN MaxLogit MSP Energy
CIFAR-100 84.3 81.8 82.0
ImageNet-1k 85.6 73.3 84.9

This could be a question for future discussion about the potential reasons for these differences when using the CLIP model.

Sure. I am also working on improving the robustness of CLIP-based OOD detection. If you have further question, feel free to discuss with me.