google / compare_gan

Compare GAN code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Details in the implementation of BigGAN

tsc2017 opened this issue · comments

Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.

First, I notice that the default moments used for batchnorm during inference are the accumulated values:

standardize_batch.use_moving_averages = False

if use_moving_averages:
mean, variance = _moving_moments_for_inference(
mean=mean, variance=variance, is_training=is_training, decay=decay)
else:
mean, variance = _accumulated_moments_for_inference(
mean=mean, variance=variance, is_training=is_training)

Does it mean that the hyperparameter decay for batchnorm is not used at all?

standardize_batch.decay = 0.9

Second, I also notice that the shortcuts are added only when in_channels !=out_channels:

add_shortcut=in_channels != out_channels,

which is different from BigGAN-pytorch:
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Third, I find that BigGAN-pytorch omit the first relu activation in the first DBlock by setting preactivation=False, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.

Do you think these discrepancy would have a significant influence on the performance of BigGAN?

Thanks

same question.

that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.

You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?