XiaLiPKU / EMANet

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

Home Page:https://xialipku.github.io/publication/expectation-maximization-attention-networks-for-semantic-segmentation/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

no grad back propagate to EMAU.conv1?

zhaokegg opened this issue · comments

Excuse me, I can not find the grad back to the CONV1. Are there some bugs?

Excuse me, I can not find the grad back to the CONV1. Are there some bugs?

No bug here. It is due to

with torch.no_grad():
. If you comment this line out, then conv1 shall have grad, but the performance may decreace a little. By now, I also don't know why no grad on conv1 is better.

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line

with torch.no_grad():
.
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

commented

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line

with torch.no_grad():

.
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line

with torch.no_grad():

.
I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?

Yes, I have. With the 'no_grad' setting, the only function of conv1, is just to map the distribution of input feature maps from R^+ to R.

commented

@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?

@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?

I forgot the concrete value here. But in my memory, Deleting the 'with torch.no_grad():' will decrease around 0.5 in mIoU.
Moreover, without the conv1 layer, the minimum result of inner product is 0. As there is a 'exp' operation inside the softmax operation, 0 becomes exp(0) = 1, so the corresponding result of z_nk is not close to 0. But with the conv1 layer, the minimum can be -inf, and the correspongindg z_nk is very close to 0. Obviously, the later is what we want.
I haven't done the ablation study of conv1. But as analysed above, without conv1, there shall be some decreasing.

commented

@XiaLiPKU thanks for your detailed explanation~it is a good job!