christiancosgrove / pytorch-spectral-normalization-gan

Paper by Miyato et al. https://openreview.net/forum?id=B1QRgziT-

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large D/G losses?

christopher-beckham opened this issue · comments

Hi,

I'm using recently-released PyTorch 0.4 (not sure if that's causing the funky numbers I'm getting), but I'm getting the following (with python main.py --model resnet --loss wasserstein):

disc loss tensor(-0.2347, device='cuda:0') gen loss tensor(-0.6066, device='cuda:0')
disc loss tensor(-120.7743, device='cuda:0') gen loss tensor(-1614.5465, device='cuda:0')
disc loss tensor(-121.0873, device='cuda:0') gen loss tensor(-1225.4401, device='cuda:0')
disc loss tensor(-56.4558, device='cuda:0') gen loss tensor(-2320.5115, device='cuda:0')
disc loss tensor(-45.6140, device='cuda:0') gen loss tensor(-2665.3479, device='cuda:0')
disc loss tensor(-46.4297, device='cuda:0') gen loss tensor(-3849.7197, device='cuda:0')
disc loss tensor(-39.8169, device='cuda:0') gen loss tensor(-4879.6089, device='cuda:0')
disc loss tensor(-56.9688, device='cuda:0') gen loss tensor(-5421.9688, device='cuda:0')
disc loss tensor(-3.2100, device='cuda:0') gen loss tensor(-4737.8677, device='cuda:0')
disc loss tensor(-36.7729, device='cuda:0') gen loss tensor(-4344.2520, device='cuda:0')
disc loss tensor(-55.6719, device='cuda:0') gen loss tensor(-6263.5303, device='cuda:0')
disc loss tensor(-62.0518, device='cuda:0') gen loss tensor(-7915.4751, device='cuda:0')
disc loss tensor(-0.5933, device='cuda:0') gen loss tensor(-7315.9282, device='cuda:0')
disc loss tensor(-26.8652, device='cuda:0') gen loss tensor(-10451.8770, device='cuda:0')
disc loss tensor(-48.6777, device='cuda:0') gen loss tensor(-8293.3584, device='cuda:0')

Is this meant to happen?

Thanks!

Can confirm that with a completely different, self-made Tensorflow implementation that the estimated Wasserstein distances get very very large. Also don't really know what is causing it... Normally values are in the range of 0 to 10 or 20, when using WGAN-GP

Doing it with GP would be counterintuitive though, since the spec norm is meant to be a (computationally cheaper) replacement for it. But thanks for reporting that on your side.

Yes I did not use GP + spectral norm at the same time, rather I used a lot of WGAN-GP, and my experience there was that the estimated Wasserstein distance was usually between 0 to 10 or 20. Then I removed the GP, and replaced it with the spectral normalization, but kept everything else the same (including Wasserstein loss), and now the estimated Wasserstein distances are all over the place, in the millions etc.