NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about fg

ygfrancois opened this issue · comments

Hello, I have some questions about how to add foreground-background prior.

  1. fg is uesd with instance input, right?
  2. the instance will provide the edge of the instances, and fg_labels will decide which are fg, but I don't understand the code of compute_mask:
    def compute_mask(self, real_As, ts, te=None): # compute the mask for foreground objects
        _, _, _, h, w = real_As.size() 
        if te is None:
            te = ts + 1        
        mask_F = real_As[:, ts:te, self.opt.fg_labels[0]].clone()
        for i in range(1, len(self.opt.fg_labels)):
            mask_F = mask_F + real_As[:, ts:te, self.opt.fg_labels[i]]
        mask_F = torch.clamp(mask_F, 0, 1)
        return mask_F  

here, it seems that "real_As" has all the foreground mask cat through the channel direction, but
the "real_As" is defined in "encode_input":

        if self.opt.use_instance:  #
            inst_map = inst_map.data.cuda()            
            edge_map = Variable(self.get_edges(inst_map))            
            input_map = torch.cat([input_map, edge_map], dim=2)

I think the number of foreground channel is same as the inst image read, because:

    def get_edges(self, t):
        edge = torch.cuda.ByteTensor(t.size()).zero_()
        edge[:,:,:,:,1:] = edge[:,:,:,:,1:] | (t[:,:,:,:,1:] != t[:,:,:,:,:-1])
        edge[:,:,:,:,:-1] = edge[:,:,:,:,:-1] | (t[:,:,:,:,1:] != t[:,:,:,:,:-1])
        edge[:,:,:,1:,:] = edge[:,:,:,1:,:] | (t[:,:,:,1:,:] != t[:,:,:,:-1,:])
        edge[:,:,:,:-1,:] = edge[:,:,:,:-1,:] | (t[:,:,:,1:,:] != t[:,:,:,:-1,:])
        return edge.float() 

So I don't know how to define the fg_label, at the same time , the generator input_nc is defined as :

        netG_input_nc = input_nc * opt.n_frames_G
        if opt.use_instance:
            netG_input_nc += opt.n_frames_G  

this seems to set the instance channel number as 1, right?
So I get confuse about it, could you please help with this? Thank you very much!

the edges is the tensor of instance edge, in witch edge is 1, and others are 0, how could the edges give the information of different types of instance to set fg_labels?