Image normalization and VIT

Question

Image normalization and VIT

GhostLate opened this issue 8 months ago · comments

I noticed, that there is only one transform: ToTensor() in the DataLoader.
Why don't you use image normalization (mean, std) before first VIT's layers?

Zhongang Cai · Answer 1 · Tue Dec 12 2023 19:57:21 GMT+0800 (China Standard Time)

Hi @GhostLate , we follow OSX in training the transformer backbones. We didn't conduct extensive experiments on training details. However, some tuning here and there may be useful.