About a tensorflow implementation

Question

About a tensorflow implementation

jh-jeong opened this issue 7 years ago · comments

I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12.
It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo,
but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way)
Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.

Tongcheng Li · Answer 1 · Wed May 03 2017 05:37:22 GMT+0800 (China Standard Time)

@jh-jeong In my "Much more efficient caffe implementation", I also reach about 4.8% for DenseNet-BC-100-12. I am curious of the cause which seems to be common between Caffe and Tensorflow.

Jongheon Jeong · Answer 2 · Wed May 17 2017 01:06:59 GMT+0800 (China Standard Time)

@Tongcheng Finally I could get 4.5% in Tensorflow. What I changed are as follows:

Changing the momentum in each BN. In Tensorflow, batch normalization uses 0.999 as the default value, but torch uses 0.9.
Applying weight decay for 'all' trainable variables, as fb.resnet.torch did, including beta/gamma variables in BN and all biases.

Warren_liu · Answer 3 · Fri Nov 24 2017 11:35:24 GMT+0800 (China Standard Time)

@jh-jeong can you share your tf-version code?