clovaai / rexnet

Official Pytorch implementation of ReXNet (Rank eXpansion Network) with pretrained models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improvements for ResNet

RudyChin opened this issue · comments

Hi,

Thanks for the great work. I have several questions.

First, do the numbers in Table 7 include the training techniques mentioned in Appendix B.2?
Second, I'm wondering why are the improvements for ResNet50 and VGG16 much smaller than that of MobileNets. (0.8% and 0.2% compared to 4%).

Thanks,
Rudy

Hello Rudy, thank you for your interest in our work!

The numbers in Table 7 are the accuracies of the models trained based on only fundamental training settings described in Section 4.1 without involving the training techniques mentioned in Appendix B.2. This is to check our training setup can reach the reported performance under a different training environment. Please refer to the training setup in Section 4.1 and Section B.1.

For the second question, we conjecture that the reason is due to the original layer configuration in those networks. A bottleneck block of ResNet50, there is no ReLU after the final expand layer (i.e., 1x1 convolution), so we cannot change the nonlinearity but modifying the channel size of the layer. VGG16 is consists of multiple 3x3 convolution layers and has no 1x1 convolution layer that plays the role of an expand layer clearly, so it seems difficult to fully apply our design principles on each 3x3 convolution (but we could make each layer close to an expand layer).

Thank you for the detailed reply!

I have a follow-up question. Do you have the results for MobileNetV2 trained with the techniques mentioned in Appendix B.2? I think this number would be informative to understand how much the proposed technique improves over the baseline design.

@RudyChin Yes, we trained MobileNetV2 with the training techniques mentioned in Appendix B.2. Agee that the training result can be clearly helpful to show the effectiveness of the training techniques. We will consider it in the next paper revision. Thank you for the suggestion.

@dyhan0920

A bottleneck block of ResNet50, there is no ReLU after the final expand layer (i.e., 1x1 convolution), so we cannot change the nonlinearity but modifying the channel size of the layer.

But there is a nonlinearity after residual. Have you tried replacing it with Swish? It should definitely benefit the performance

@bonlime Thanks for your comment! I will try changing those ReLUs with Swishs.