compare to CvT

Question

compare to CvT

liyunsheng13 opened this issue 3 years ago · comments

Hi

Thanks for sharing this good work. I'm curious about why the proposed loss function can outperform CvT, which contains a depthwise convolution that is capable to learn local features.

Yahui Liu · Answer 1 · Tue Nov 30 2021 10:44:30 GMT+0800 (China Standard Time)

Hi @liyunsheng13,

Good question. It's also similar on the results of ResNet, where there are only convolution layers. We guess that the proposed loss acts as a regularizer, which helps both VTs and CNNs learn local features better, especially in the earlier epochs. You're right the convolutional layers are capable to learn local features. In our experiments, we can see only marginally or the same performance with longer training.

Yahui Liu · Answer 2 · Tue Nov 30 2021 11:16:21 GMT+0800 (China Standard Time)

Besides, if you check our supplementary materials, we also show a figure about the training with different VTs on CIFAR-100.

It shows that the gain from our loss for the architecture with depthwise convolution is obviously less.

Yunsheng Li · Answer 3 · Tue Nov 30 2021 14:24:53 GMT+0800 (China Standard Time)

Got it. Thanks for your response.