rosinality / swapping-autoencoder-pytorch

Unofficial implementation of Swapping Autoencoder for Deep Image Manipulation (https://arxiv.org/abs/2007.00653) in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using this implementation for Global editing

Nerdyvedi opened this issue · comments

Is it possible to use this for Region editing and Global editing as mentioned in the paper.

The figure below is from the paper.

image

Thanks a lot.

I haven't tried it, but I think you can implement it.

@rosinality . Would be grateful if you could guide me. I am trying to implement the Global editing feature.
From what I got , we need to edit the texture code to change global attributes like age, lighting and background. They edited the texture code using an interactive UI that performs vector arithmetic using the PCA components, But they never mentioned what vector arithmetic needs to be performed.

Have you perform PCA on texture codes? Then you can move texture vectors along the principal components. For example you can choose principal components that explains largest variances and add it to texture vectors with some scalar. (texture vector + scalar * principal component) Please refer to this paper: https://arxiv.org/abs/2004.02546

Or you can try approaches like this https://arxiv.org/abs/2007.06600 that extracts eigenvector directly from weight matrices. I tried it on stylegan2, and maybe you can refer to it. https://github.com/rosinality/stylegan2-pytorch

Hi @rosinality
Could you please share the code so we can quickly try it? Or please give us an instruction about how to use it on swapping autoencoder. Thanks!

@juzuo You can do it like this:

import torch
from torch.utils import data
from torchvision import transforms
from model import Encoder, Generator
from stylegan2.dataset import MultiResolutionDataset
from tqdm import tqdm_notebook
from matplotlib import pyplot as plt

ckpt = 'checkpoint/ffhq-070000.pt'
dset = 'ffhq.lmdb'

ckpt = torch.load(ckpt, map_location=lambda storage, loc: storage)
transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True),
    ]
)
dset = MultiResolutionDataset(dset, transform, 256)

device = 'cuda'
encoder = Encoder(32).to(device)
generator = Generator(32).to(device)
encoder.load_state_dict(ckpt['e_ema'])
generator.load_state_dict(ckpt['g_ema']);

textures = []
loader = data.DataLoader(dset, batch_size=256)

with torch.no_grad():
    for batch in tqdm_notebook(loader):
        _, texture = encoder(batch.to(device))
        textures.append(texture.to('cpu'))
        
texture_t = torch.cat(textures, 0)
texture_c = texture_t - texture_t.mean(0, keepdim=True)

U, S, V = torch.svd(texture_t)

dataset_i = 45000
scale = 100
direction_i = 1

with torch.no_grad():
    structure, texture = encoder(dset[dataset_i].unsqueeze(0).to(device))
    img = generator(structure, texture)
    img_e = generator(structure, texture + scale * V[:, direction_i].unsqueeze(0).to(device))
    
plt.imshow(torch.cat((img, img_e), 3).to('cpu').squeeze(0).add_(1).div_(2).clamp_(0, 1).permute(1, 2, 0).numpy())

Thanks, my buddy! @rosinality I will try it ASAP.

By the way, I have a question, is it possible to train the model by a single RTX 3090 GPU?

@juzuo Batch size will be the problem. I think you can train if you use gradient accumulation.

I see. Thanks buddy!