slef.u can't be updated?

Question

slef.u can't be updated?

WellYoungIOE opened this issue 7 years ago · comments

The self.u in SpectralNorm seems never update during training. And u in spectral_norm will be reinitialized with u = Variable(W.data.new((W.data.shape[1]))).data.normal_(0,1) in every epoch. The type of self.u is None all the time.
Any way to figure this out?
Thanks!

Christian Cosgrove · Answer 1 · Fri Dec 08 2017 07:29:42 GMT+0800 (China Standard Time)

Thanks for pointing this out. spectral_norm wasn't properly initializing self.u so I've moved the initialization into SpectralNorm's constructor.

WellYoungIOE · Answer 2 · Fri Dec 08 2017 11:25:47 GMT+0800 (China Standard Time)

The u_n in v_n = F.normalize(torch.mv(torch.t(W.view(height,-1).data), u_n), p=2, dim=0), u_n = F.normalize(torch.mv(W.view(height,-1).data, v_n), p=2, dim=0) and singular_value = torch.dot(u_n, prod) I think you mean u_n.data.
Now self.u is updating in every forward. But it also makes the code unable to run on multi GPUs.

Christian Cosgrove · Answer 3 · Sat Dec 09 2017 00:18:55 GMT+0800 (China Standard Time)

Wouldn't it be sufficient to make self.u a Tensor and not a Variable, since (as described OpenReview responses) its gradients don't have to be computed? If this is the case, I don't think u_n.data would be right.
What makes this implementation unable to run on multiple GPUs?

WellYoungIOE · Answer 4 · Sat Dec 09 2017 18:43:22 GMT+0800 (China Standard Time)

torch.mv only support the operation between two Tensor. It's OK to make self.u a Varibale or even a Parameter, but the u_n here should be a Tensor, or it will turn out with a TypeError: TypeError: torch.mv received an invalid combination of arguments - got (torch.cuda.FloatTensor, !Variable!), but expected (torch.cuda.FloatTensor source, torch.cuda.FloatTensor vec).
With DataParallel, the weights of discriminator will be sent to every GPU. But self.u is always in gpu0 during training. Then the computation of v_n will lead to an error: RuntimeError: arguments are located on different GPUs.

Christian Cosgrove · Answer 5 · Sun Dec 10 2017 04:06:50 GMT+0800 (China Standard Time)

I see. Thanks for explaining this issue.

Perhaps it would be better to make everything a Variable (in fact, make self.u a Parameter and register it in the list of parameters so that DataParallel puts it on the right device). Then, instead of using the nondifferentiable torch.mv operation, use autograd-friendly matrix-vector multiplication.

The benefit of using autograd Variables is that it should compute the gradient regularization term (equation 12 of https://openreview.net/pdf?id=B1QRgziT-), which I was overlooking before.

I'll try to get this approach working.

Christian Cosgrove · Answer 6 · Sun Dec 17 2017 22:55:17 GMT+0800 (China Standard Time)

I've made these changes. Let me know if there are further problems. Thanks

WellYoungIOE · Answer 7 · Mon Dec 18 2017 11:59:42 GMT+0800 (China Standard Time)

It works fine with one GPU. But u and v can not be updated with multiple GPUs.

There are still problems with the discriminator when I use multiple GPUs. The u and v in _update_u_v seems to be the copy of the original one and they are turned into Variable. In every forward, u and v will be reinitialized to the original value which are calculated in _make_params.