williamyang1991 / VToonify

[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the pretrain code get an gray image?

seamoonlight-YBY opened this issue · comments

i run the
python train_vtonify_d.py --pretrain
the save the image of variable
real_skip \ fake_skip\ img_gen
because i want to see the relation between them
i find pretrained 'fake_skip' image is a color face segmentation image 3232, according to img_gen.
but get a total gray image in "real_skip" 32
32.
From this line of code
recon_loss = F.mse_loss(fake_feat, real_feat) + F.mse_loss(fake_skip, real_skip)
This optimization direction seems to be wrong

what‘s wrong in my operation....

my shell is:
python train_vtoonify_d.py --iter 1 --exstyle_path DualStyleGAN/checkpoint/arcane/exstyle_code.npy --batch 1 --name GG --stylegan_path DualStyleGAN/checkpoint/arcane/generator.pt --pretrain

my saving is
def save_image(img, filename): tmp = ((img.detach().numpy().transpose(1, 2, 0) + 1.0) * 127.5).astype(np.uint8) cv2.imwrite(filename, cv2.cvtColor(tmp, cv2.COLOR_RGB2BGR))
save_image(img_gen[0].cpu(),'real_input.jpg') save_image(real_skip[0].cpu(),'real_skip.jpg') save_image(fake_skip[0].cpu(),'fake_skip.jpg')

The real_skip is produced by

class ToRGB(nn.Module):
def __init__(self, in_channel, style_dim, upsample=True, blur_kernel=[1, 3, 3, 1]):
super().__init__()
if upsample:
self.upsample = Upsample(blur_kernel)
self.conv = ModulatedConv2d(in_channel, 3, 1, style_dim, demodulate=False)
self.bias = nn.Parameter(torch.zeros(1, 3, 1, 1))
def forward(self, input, style, skip=None, externalweight=None):
out = self.conv(input, style, externalweight)
out = out + self.bias
if skip is not None:
skip = self.upsample(skip)
out = out + skip
return out

which will be definitely a 3-channel feature map.

I have no idea why your saved real_skip only has one channel

Maybe you should add some code like print(real_skip.shape) to track the change of the shape of the feature map.

sorry, i didnot make the question clear
the real_skip shape is indeed 1,3,32,32
but the values inside are so small ,which are concentrated in the 10^-3
torch.max(real_skip)=0.015
that is why it become gray after adding 127
and my question is :
Should the value of real_skip be this small?
in my opinion, the value of real_skip should be the same as fake_skip
because my network is pre-trained Vtoonify

real_skip is just a mid-layer feature map not an image, and it is generated by the fixed backbone.
You can't expect it to be within your expectecd range.

The final image is generated by summing the real_skips from all layers. So it is normal that one real_skip has small value.

image

maybe you should use some operation like normalization to vizualize the feature map, instead of directly view it as an image.

Now I get it!
THX 😊