RandAug + EraseAug + SE Block + Swish ?

Question

RandAug + EraseAug + SE Block + Swish ?

ildoonet opened this issue 4 years ago · comments

In my work, I am in the process of verifying RexNet to use as a factor in the in-house model tuning.

You have trained RexNet models with RandAug, EraseAug, SE Blocks and SiLU(Swish) Activations.

Those mentioned above were not used in mobilenet-v2 training.

Since you argue that adjusting the number of channels per layers in mobilenet-v2 is important factor to improve the performance, I have trained RexNet without those techniques. Then I got 72.9-73.2% top-1 accuracy, To check if it was a training problem, I trained RexNet with the above technique on, and it came out similar to the paper.

https://github.com/ildoonet/pytorch-image-models

so the questions are,

The argument 'Diminishing Representational Bottleneck' by adjusting the channel size seems to be uncertain from the paper, what do you think?
Have you tried to train mobilenet-v2 models with above techniques, without adjusting channel sizes?

Thanks. Looking forward to your response.

Dongyoon Han · Answer 1 · Mon Aug 24 2020 17:34:42 GMT+0800 (China Standard Time)

For the first question, adjusting the channel size is not the only solution for diminishing representational bottleneck. We replaced the activation ReLU6 with other activations such as Swish or ELU to expand the rank. Note that we replace ReLU6s after the expand layers such as the first 1x1 convs in each IB following our claim. Additionaly, using Swish is not the only solution.

Next, have you trained MobileNetV2 without those techniques? ReXNet shows a better result without the techniques. We already checked this in our paper (see Table 5.(c) in discussions). Our MobileNetV2 got 73.1% acc., and only expanding channel size (Exp. in the table) gave us 75.5%. Using those techniques will improve the accuracy of MobileNetV2 as well, so if you are curious about the score, I will train it.

Consider EfficientNets which have SE Blocks and with fully-equipped Swishs (after each conv) and were trained with the techniques AutoAug, DropBlock, RandErase, and so on. Training without them will clearly show lowered accuracy.

Hope this reply would helpful to you. If you have further concerns please let me know.

curtis.abcd · Answer 2 · Tue Aug 25 2020 13:03:15 GMT+0800 (China Standard Time)

@dyhan0920 Thanks for the quick reply and sorry for my misunderstanding. Table 5 solved most of my questions.

In my implementation, I am looking at a problem that does not match Table 5. Thanks again.