Improvements for ResNet

Question

Improvements for ResNet

RudyChin opened this issue 4 years ago · comments

Ting-Wu (Rudy) Chin commented 4 years ago

Hi,

Thanks for the great work. I have several questions.

First, do the numbers in Table 7 include the training techniques mentioned in Appendix B.2?
Second, I'm wondering why are the improvements for ResNet50 and VGG16 much smaller than that of MobileNets. (0.8% and 0.2% compared to 4%).

Thanks,
Rudy

Dongyoon Han · Answer 1 · Sun Jul 05 2020 03:00:16 GMT+0800 (China Standard Time)

Hello Rudy, thank you for your interest in our work!

The numbers in Table 7 are the accuracies of the models trained based on only fundamental training settings described in Section 4.1 without involving the training techniques mentioned in Appendix B.2. This is to check our training setup can reach the reported performance under a different training environment. Please refer to the training setup in Section 4.1 and Section B.1.

For the second question, we conjecture that the reason is due to the original layer configuration in those networks. A bottleneck block of ResNet50, there is no ReLU after the final expand layer (i.e., 1x1 convolution), so we cannot change the nonlinearity but modifying the channel size of the layer. VGG16 is consists of multiple 3x3 convolution layers and has no 1x1 convolution layer that plays the role of an expand layer clearly, so it seems difficult to fully apply our design principles on each 3x3 convolution (but we could make each layer close to an expand layer).

Ting-Wu (Rudy) Chin · Answer 2 · Sun Jul 05 2020 03:05:41 GMT+0800 (China Standard Time)

Thank you for the detailed reply!

I have a follow-up question. Do you have the results for MobileNetV2 trained with the techniques mentioned in Appendix B.2? I think this number would be informative to understand how much the proposed technique improves over the baseline design.

Dongyoon Han · Answer 3 · Mon Jul 06 2020 09:56:01 GMT+0800 (China Standard Time)

@RudyChin Yes, we trained MobileNetV2 with the training techniques mentioned in Appendix B.2. Agee that the training result can be clearly helpful to show the effectiveness of the training techniques. We will consider it in the next paper revision. Thank you for the suggestion.

bonlime · Answer 4 · Sat Oct 03 2020 19:15:42 GMT+0800 (China Standard Time)

@dyhan0920

A bottleneck block of ResNet50, there is no ReLU after the final expand layer (i.e., 1x1 convolution), so we cannot change the nonlinearity but modifying the channel size of the layer.

But there is a nonlinearity after residual. Have you tried replacing it with Swish? It should definitely benefit the performance

Dongyoon Han · Answer 5 · Tue Oct 06 2020 16:05:49 GMT+0800 (China Standard Time)

@bonlime Thanks for your comment! I will try changing those ReLUs with Swishs.