Question about baseline results in Tab 2

Question

Question about baseline results in Tab 2

Z-ZHHH opened this issue 9 months ago · comments

Appreciate your impressive work.
In the table 2 of the main paper, is the MSP, MaxLogit results reproduced on CLIP or CLIPN? I test the MaxLogit on CLIP (VitB-32) on CIFAR100 (id) and CIFAR10 (ood), but only get 74.8%AUROC.

SiLang · Answer 1 · Sat Dec 02 2023 22:57:53 GMT+0800 (China Standard Time)

Sorry for the late reply. The results of MSP and MaxLogit in Table 2 are produced by the original CLIP model. You can check the new-uploaded "./handcrafted/src/zero_shot_infer.py" for hand-crafted CLIPN. It can produce the proper results on CIFAR.

Zhang ZiHan · Answer 2 · Thu Dec 07 2023 10:22:25 GMT+0800 (China Standard Time)

Thanks for your reply. I find different implementation may superly affect the OOD detection performance, while the ID classification performance is similar.

Zhang ZiHan · Answer 3 · Thu Dec 07 2023 10:26:26 GMT+0800 (China Standard Time)

It seems that in the file./handcrafted/src/zero_shot_infer.py These lines, the model is the CLIPN model as CLIP do not have ckpt with epoch num.

SiLang · Answer 4 · Thu Dec 07 2023 10:33:37 GMT+0800 (China Standard Time)

That’s because there is a hyper parameter temperature. I just use the final learned one, 100 instead of manually finding best one on test set for different methods and datasets. Besides, whether using L2 normalization can also change the performance. Unfortunately, I failed to find a consistent conclusion of determining the above two factors on different methods and datasets. As a result, a general and fair way is to follow the original operation of the CLIP.

SiLang · Answer 5 · Thu Dec 07 2023 10:37:52 GMT+0800 (China Standard Time)

The image encoder and text encoder of CLIPN is the same as CLIP. We freeze them when training no text encoder. That means you can find the same CLIP model from CLIPN models at different epochs.

Zhang ZiHan · Answer 6 · Thu Dec 07 2023 10:56:52 GMT+0800 (China Standard Time)

Thanks for your quick and detailed reply!
I implemented the baseline methods (MSP, MaxLogit, Energy) with MCM repo on CIFAR-100 and ImageNet, and the results were kind of different from your reported results on CIFAR-100.
Specifically speaking, these are the difference (avg. on the OOD datasets):
our implementation with MCM repo

Vit-B32 MCM	MaxLogit	MSP	Energy
CIFAR-100	79.5	72.5	77.3
ImageNet-1k	84.9	78.9	82.6

reported results in CLIPN

Vit-B32 CLIPN	MaxLogit	MSP	Energy
CIFAR-100	84.3	81.8	82.0
ImageNet-1k	85.6	73.3	84.9

This could be a question for future discussion about the potential reasons for these differences when using the CLIP model.

SiLang · Answer 7 · Thu Dec 07 2023 11:03:29 GMT+0800 (China Standard Time)

Sure. I am also working on improving the robustness of CLIP-based OOD detection. If you have further question, feel free to discuss with me.