Training recipe

Question

Training recipe

leoxiaobin opened this issue 4 years ago · comments

Thanks for sharing your code.

I tried to train my own ReXNet using the recipe the repo provided:

./distributed_train.sh 4 /imagenet/ --model rexnetv1 --rex-width-mult 1.0 --opt sgd --amp \
 --lr 0.5 --weight-decay 1e-5 \
 --batch-size 128 --epochs 400 --sched cosine \
 --remode pixel --reprob 0.2 --drop 0.2 --aa rand-m9-mstd0.5

In your paper, it shows you used the ReXNet with stochastic depth rate of 0.2. However, the provided recipe does not used stochastic depth drop.

My question is that, in order to re-produce the results, do I need to use the stochastic depth drop?

Bin Xiao · Answer 1 · Sat Sep 26 2020 18:23:43 GMT+0800 (China Standard Time)

I have followed the suggested training recipe to train ReXNet, which achieved a top-1 accuracy of 76.7 on ImageNet val dataset. However, the performance is worse than the reported result in the original paper.

Is there anyone that has reproduced the reported performance in the paper using the suggested training recipe?

Dongyoon Han · Answer 2 · Sun Sep 27 2020 09:33:14 GMT+0800 (China Standard Time)

@leoxiaobin Hello Leo, sorry for the late reply. Please try another training without using stochastic depth for 1.0x model. Stochastic depth is needed for larger models.

Bin Xiao · Answer 3 · Sun Sep 27 2020 10:30:14 GMT+0800 (China Standard Time)

hi, @dyhan0920 ,thank for you reply.
The 76.7 is obtained using1.0x model, by the command below

./distributed_train.sh 4 /imagenet/ --model rexnetv1 --rex-width-mult 1.0 --opt sgd --amp \
 --lr 0.5 --weight-decay 1e-5 \
 --batch-size 128 --epochs 400 --sched cosine \
 --remode pixel --reprob 0.2 --drop 0.2 --aa rand-m9-mstd0.5

Are these the right setting for reproducing the result for 1.0x model?

Dongyoon Han · Answer 4 · Tue Oct 06 2020 15:55:10 GMT+0800 (China Standard Time)

@leoxiaobin The setting you provided is identical to mine with the default settings. Please check the default setting in the training code such as warming-up parameters (1e-4), label smoothing (0.1), not using sync-bn, not using ema, or so on. Sorry again for the late reply and please let me know how your results go on.