NVlabs / FUNIT

Translate images to unseen domains in the test time with few example images.

Home Page:https://nvlabs.github.io/FUNIT/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discriminator architecture question

Johnson-yue opened this issue · comments

hi,I read your paper but I don‘t confirm the output of Discriminator!!

as your paper :

It consist of one convolution layer and followed by 4 AvgPool2x2 , if input image size is 128x128 then the output should be 4x4 and channel is ||S|| ?

so the output of discriminator size is (bs, ||S||, 4, 4)?

I also can not understand FC-256 layer of Decoder module, it output is mu and var, and those are input AdaIn Resblk-512 ?? how implement AdaIN Resblk-512

I think AdaIN Resblk-512 should have 512 channels, and it with respective to mu and var should have 512 channels, but in Figure 6. it only from FC-256

question 3:
according to Table 5 in your paper. Gradient Penalty is very import for this model , it is the same as WGAN-GP implement ??

I am also very interested in AdaIN Resblk. Is it similar to the implementation of SPADEResnetBlock in SPADE?

@corenel how did you get (mu1, sigma1), (mu2, sigma2) from FC-256

commented

hi,I read your paper but I don‘t confirm the output of Discriminator!!

as your paper :

It consist of one convolution layer and followed by 4 AvgPool2x2 , if input image size is 128x128 then the output should be 4x4 and channel is ||S|| ?

so the output of discriminator size is (bs, ||S||, 4, 4)?

Since it is a PatchGAN discriminator, (bs, ||S||, 4, 4) makes sense, then for each input class k you average the predictive error from D[:, k, :, :]

I'm also confused about the AdaIN Resblk-512 mismatch with FC-256, and where the (mu1, sigma1), (mu2, sigma2) come from... Anybody knows?

hi,I read your paper but I don‘t confirm the output of Discriminator!!
as your paper :
It consist of one convolution layer and followed by 4 AvgPool2x2 , if input image size is 128x128 then the output should be 4x4 and channel is ||S|| ?
so the output of discriminator size is (bs, ||S||, 4, 4)?

Since it is a PatchGAN discriminator, (bs, ||S||, 4, 4) makes sense, then for each input class k you average the predictive error from D[:, k, :, :]

I'm also confused about the AdaIN Resblk-512 mismatch with FC-256, and where the (mu1, sigma1), (mu2, sigma2) come from... Anybody knows?

I think (mu1, sigma1), (mu2, sigma2) come from last layer of FC-256

commented

I think (mu1, sigma1), (mu2, sigma2) come from last layer of FC-256

(mu1, sigma1) and (mu2, sigma2) should have same dimensionality as the ResBlk-512, so I guess there is an affine transform between FC-256 and ResBlk-512 or something. That's the incognita: What something?

yes, I think so. but just guess

commented

@mingyuliutw how do you avoid the channel-mismatch between the Res Blocks at the discriminator? Do you use 1x1 conv to match the channels by increasing channel dims, or bottleneck residual blocks sticking to 64 channels all through the discriminator?

Hi @GarciaDelMolino, @Johnson-yue
I think FC-256 means the middle dim in the MLP is 256. The dim of the output feature is 512.
If you check the code at

def __init__(self, in_dim, out_dim, dim, n_blk, norm, activ):

in_dim = 64   #latent_dim (input_dim)
out_dim = 512 # which matchs the channel number of the decoder. (output_dim)
dim = 256 #nf_mlp (the mlp middle dim)

ok, 3Q @layumi when I ask the issue, the code was not release . So I will close the issue