We apply the recent Filter Response Normalization method on a better and common training recipe of ResNet-50 on ImageNet, to understand how well it works under this recipe.
We take the ImageNet training code in TensorFlow from ppwwyyxx/GroupNorm-reproduce. The training code follows a common training recipe that is used in the following two papers:
and reproduces exact baselines numbers of the above two papers, i.e.:
Model | Top 1 Error |
---|---|
ResNet-50, BatchNorm | 23.6% |
ResNet-50, GroupNorm | 24.0% |
We apply a patch FRN-cosineLR.diff on the abovementioned code on top of commit e0d0b1, to implement Filter Response Normalization as well as cosine LR schedule.
The updated code is included in this directory.
This command trains a ResNet-50 with BatchNorm on ImageNet:
./imagenet-resnet.py --data /path/to/imagenet
To use FRN+TLU, add --frn-trelu
. To use cosine LR schedule, add --cosine-lr
.
We train our models on machines with 8 V100s using TensorFlow 1.14.
Without cosine LR schedule:
Model | Top 1 Error |
---|---|
ResNet-50, BN | 23.6% |
ResNet-50, FRN+TLU | 24.0% |
With cosine LR schedule:
Model | Top 1 Error |
---|---|
ResNet-50, BN | 23.0% |
ResNet-50, FRN+TLU | 23.2% |
Experiments are only run once. Typical variance of such training is roughly ±0.1 around the mean.
Results in the Filter Response Normalization paper uses a different recipe. Potential differences include:
- Input image size is 299x299 v.s. our 224x224.
- The use of "ResNet-v2" v.s. our classic ResNet. In our ResNet, activation do not always come immediately after normalization, which may affect the use of TLU.
- Training length is unclear (seems to be "300k steps" with batch size 256) v.s. our 100 epochs.
- Input augmentation may be different.
- Exact definition of cosine LR schedule may be different.
The paper reports the following results (with cosine LR schedule):
Model | Top 1 Error |
---|---|
ResNetV2-50, BN | 23.8% |
ResNetV2-50, FRN+TLU | 22.8% |