Large D/G losses?

Question

Large D/G losses?

christopher-beckham opened this issue 6 years ago · comments

Christopher Beckham, PhD commented 6 years ago

Hi,

I'm using recently-released PyTorch 0.4 (not sure if that's causing the funky numbers I'm getting), but I'm getting the following (with python main.py --model resnet --loss wasserstein):

disc loss tensor(-0.2347, device='cuda:0') gen loss tensor(-0.6066, device='cuda:0')
disc loss tensor(-120.7743, device='cuda:0') gen loss tensor(-1614.5465, device='cuda:0')
disc loss tensor(-121.0873, device='cuda:0') gen loss tensor(-1225.4401, device='cuda:0')
disc loss tensor(-56.4558, device='cuda:0') gen loss tensor(-2320.5115, device='cuda:0')
disc loss tensor(-45.6140, device='cuda:0') gen loss tensor(-2665.3479, device='cuda:0')
disc loss tensor(-46.4297, device='cuda:0') gen loss tensor(-3849.7197, device='cuda:0')
disc loss tensor(-39.8169, device='cuda:0') gen loss tensor(-4879.6089, device='cuda:0')
disc loss tensor(-56.9688, device='cuda:0') gen loss tensor(-5421.9688, device='cuda:0')
disc loss tensor(-3.2100, device='cuda:0') gen loss tensor(-4737.8677, device='cuda:0')
disc loss tensor(-36.7729, device='cuda:0') gen loss tensor(-4344.2520, device='cuda:0')
disc loss tensor(-55.6719, device='cuda:0') gen loss tensor(-6263.5303, device='cuda:0')
disc loss tensor(-62.0518, device='cuda:0') gen loss tensor(-7915.4751, device='cuda:0')
disc loss tensor(-0.5933, device='cuda:0') gen loss tensor(-7315.9282, device='cuda:0')
disc loss tensor(-26.8652, device='cuda:0') gen loss tensor(-10451.8770, device='cuda:0')
disc loss tensor(-48.6777, device='cuda:0') gen loss tensor(-8293.3584, device='cuda:0')

Is this meant to happen?

Thanks!

Daniel Stoller · Answer 1 · Fri May 04 2018 02:15:25 GMT+0800 (China Standard Time)

Can confirm that with a completely different, self-made Tensorflow implementation that the estimated Wasserstein distances get very very large. Also don't really know what is causing it... Normally values are in the range of 0 to 10 or 20, when using WGAN-GP

Christopher Beckham, PhD · Answer 2 · Fri May 04 2018 07:44:44 GMT+0800 (China Standard Time)

Doing it with GP would be counterintuitive though, since the spec norm is meant to be a (computationally cheaper) replacement for it. But thanks for reporting that on your side.

Daniel Stoller · Answer 3 · Fri May 04 2018 20:27:54 GMT+0800 (China Standard Time)

Yes I did not use GP + spectral norm at the same time, rather I used a lot of WGAN-GP, and my experience there was that the estimated Wasserstein distance was usually between 0 to 10 or 20. Then I removed the GP, and replaced it with the spectral normalization, but kept everything else the same (including Wasserstein loss), and now the estimated Wasserstein distances are all over the place, in the millions etc.