no grad back propagate to EMAU.conv1?
zhaokegg opened this issue · comments
Excuse me, I can not find the grad back to the CONV1. Are there some bugs?
Excuse me, I can not find the grad back to the CONV1. Are there some bugs?
No bug here. It is due to
Line 227 in f7d7b47
Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.
Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.
You can train a model without line
Line 227 in f7d7b47
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.
Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.
You can train a model without line
Line 227 in f7d7b47
.
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.
but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?
Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.
You can train a model without line
Line 227 in f7d7b47
.
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?
Yes, I have. With the 'no_grad' setting, the only function of conv1, is just to map the distribution of input feature maps from R^+ to R.
@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?
@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?
I forgot the concrete value here. But in my memory, Deleting the 'with torch.no_grad():' will decrease around 0.5 in mIoU.
Moreover, without the conv1 layer, the minimum result of inner product is 0. As there is a 'exp' operation inside the softmax operation, 0 becomes exp(0) = 1, so the corresponding result of z_nk is not close to 0. But with the conv1 layer, the minimum can be -inf, and the correspongindg z_nk is very close to 0. Obviously, the later is what we want.
I haven't done the ablation study of conv1. But as analysed above, without conv1, there shall be some decreasing.