About a tensorflow implementation
jh-jeong opened this issue · comments
I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12.
It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo,
but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way)
Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.
@jh-jeong In my "Much more efficient caffe implementation", I also reach about 4.8% for DenseNet-BC-100-12. I am curious of the cause which seems to be common between Caffe and Tensorflow.
@Tongcheng Finally I could get 4.5% in Tensorflow. What I changed are as follows:
- Changing the momentum in each BN. In Tensorflow, batch normalization uses 0.999 as the default value, but torch uses 0.9.
- Applying weight decay for 'all' trainable variables, as fb.resnet.torch did, including beta/gamma variables in BN and all biases.
@jh-jeong can you share your tf-version code?