google / compare_gan

Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.

First, I notice that the default moments used for batchnorm during inference are the accumulated values:

compare_gan/example_configs/biggan_imagenet128.gin

Line 30 in 3af50e3

standardize_batch.use_moving_averages = False

compare_gan/compare_gan/architectures/arch_ops.py

Lines 299 to 304 in e0b739f

    
           if use_moving_averages: 
        
             mean, variance = _moving_moments_for_inference( 
        
                 mean=mean, variance=variance, is_training=is_training, decay=decay) 
        
           else: 
        
             mean, variance = _accumulated_moments_for_inference( 
        
                 mean=mean, variance=variance, is_training=is_training)

Does it mean that the hyperparameter decay for batchnorm is not used at all?

compare_gan/example_configs/biggan_imagenet128.gin

Line 28 in 3af50e3

standardize_batch.decay = 0.9

Second, I also notice that the shortcuts are added only when in_channels !=out_channels:

compare_gan/compare_gan/architectures/resnet_biggan.py

Line 339 in 3af50e3

add_shortcut=in_channels != out_channels,

which is different from BigGAN-pytorch:
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Third, I find that BigGAN-pytorch omit the first relu activation in the first DBlock by setting preactivation=False, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.

Do you think these discrepancy would have a significant influence on the performance of BigGAN?

Thanks

same question.

that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.

You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?

	if use_moving_averages:
	mean, variance = _moving_moments_for_inference(
	mean=mean, variance=variance, is_training=is_training, decay=decay)
	else:
	mean, variance = _accumulated_moments_for_inference(
	mean=mean, variance=variance, is_training=is_training)

Details in the implementation of BigGAN