Details in the implementation of BigGAN
tsc2017 opened this issue · comments
Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.
First, I notice that the default moments used for batchnorm during inference are the accumulated values:
compare_gan/compare_gan/architectures/arch_ops.py
Lines 299 to 304 in e0b739f
Does it mean that the hyperparameter decay
for batchnorm is not used at all?
Second, I also notice that the shortcuts are added only when in_channels !=out_channels
:
which is different from
BigGAN-pytorch
:https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when
in_channels !=out_channels
or when the block is an upsampling or downsampling block.
Third, I find that BigGAN-pytorch
omit the first relu activation in the first DBlock by setting preactivation=False
, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch
, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.
Do you think these discrepancy would have a significant influence on the performance of BigGAN?
Thanks
same question.
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.
Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.
You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?