Normalization used for VitBase (LN)

Question

Normalization used for VitBase (LN)

george-conrad opened this issue a year ago · comments

Hi,

thanks for sharing your work! Regarding the produced VitBase (LN) results, we noticed that there is a mismatch between the normalization you are using and the one used by timm. The timm vision transformer uses a mean of (0.5, 0.5, 0.5) and a standard deviation of (0.5, 0.5, 0.5). As a result, we got the following results for label_shifts, corresponding to Table 2 in your paper:

Acc@1	Gauss.	Shot	Impul.	Defoc.	Glass	Motion	Zoom	Snow	Frost	Fog	Bright.	Contr.	Elastic	Pixel	JPEG
VitBase (LN) no_adapt	46.9	47.7	46.9	42.9	34.2	50.8	44.8	57.0	52.5	56.5	76.1	31.9	46.6	65.5	66.1
TENT	58.4	59.9	59.8	58.9	56.6	62.5	59.4	66.8	24.5	70.9	79.1	63.1	65.7	73.8	71.7
SAR	59.1	60.3	60.5	59.3	57.7	62.9	59.9	67.6	66.5	70.8	79.2	63.7	66.5	74.0	71.8

All the best,
George

mr-eggplant · Answer 1 · Wed May 03 2023 03:38:01 GMT+0800 (China Standard Time)

Hi George, thank you for bringing up this issue! We will look into it and get back to you later. Apologies for the delayed response, as we are currently traveling in Africa. Thanks for understanding.

mr-eggplant · Answer 2 · Tue May 09 2023 16:34:45 GMT+0800 (China Standard Time)

Hi George,

Thank you for your inquiry. We did not specifically set the normalization values and simply utilized the parameters provided by the ImageNet-C dataset.

Upon investigating the issue you raised, we found that using the [0.5, 0.5, 0.5] normalization values indeed resulted in higher accuracies for no_adapt, tent, and our sar. However, this does not impact our primary conclusions: 1) models with GN/LN are better suited for stable TTA compared to those with BN, and 2) GN/LN models do not always succeed and can still experience failure cases (e.g., failing on Frost). In this failure case, our sar demonstrates stability.

On the other hand, using the [0.485, 0.456, 0.406] normalization values can be considered a more severe distribution shift. Under this more challenging distribution shift, the source accuracy (no_adapt, averaged over 15 corruptions, level 5) drops to 29.9%. However, our sar improves this accuracy to 56.3% (tent is only 47.7%), showcasing its capability. We appreciate your insight, as it has prompted us to consider this matter more deeply in the context of Stable TTA.

Best regards,