bryandlee / animegan2-pytorch

PyTorch implementation of AnimeGANv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to train another model like "Face Portrait v1 "

ruanjiyang opened this issue · comments

Could you let us know who to train another model like "Face Portrait v1 "?

as I know, Animegan2 is not for facial style transfer.. so I really want to know the detail steps to train facial model by using Animegan2.

Thanks very much!

Those weights are trained using a pix2pix method similar to this: https://github.com/justinpinkney/toonify

input(face) → Animegan2 generator → output <==> target(portrait), loss = LPIPS + L2 + GAN

Those weights are trained using a pix2pix method similar to this: https://github.com/justinpinkney/toonify

input(face) → Animegan2 generator → output <==> target(portrait), loss = LPIPS + L2 + GAN

Your model for Face Portrait v1 is stunning. can you give anymore details about it?

Hi, what kind of details are you looking for?

Hi, what kind of details are you looking for?

I want to know that how do you generate your target portrait images? using the network blend method like https://github.com/justinpinkney/toonify? I have adopted the network blend method to reproduce the disney cartoon model, the facial region can be transferred to disney style well but the background is also changed drastically. How do you make sure the background similarity like your Face Portrait v1 model ?

Thanks!

I used the same network blending method, but the implementation may differ.
Below is my own implementation for the official stylegan2 model

from training.networks import Generator
from copy import deepcopy
import math


def gather_params(G: Generator) -> dict:
    params = dict(
        [(res, {}) for res in G.synthesis.block_resolutions] + [("mapping", {})]
    )
    # G params: mapping.xxx / synthesys.b128.xxx
    for n, p in sorted(list(G.named_buffers()) + list(G.named_parameters())):
        if n.startswith("mapping"):
            params["mapping"][n] = p
        else:
            res = int(n.split(".")[1][1:])
            params[res][n] = p
    return params


def blend_models(G_low: Generator, G_high: Generator, swap_layer: int, blend_width: float = 3) -> Generator:
    params_low = gather_params(G_low)
    params_high = gather_params(G_high)

    for layer_idx, res in enumerate(G_low.synthesis.block_resolutions):
        x = layer_idx - swap_layer
        
        if blend_width is not None:
            assert blend_width > 0
            exponent = - x / blend_width
            y = 1 / (1 + math.exp(exponent))
        else:
            y = 1 if x > 0 else 0
            
        for n, p in params_high[res].items():
            params_high[res][n] = params_high[res][n] * y + params_low[res][n] * (1 - y)

    state_dict = {}
    for _, p in params_high.items():
        state_dict.update(p)

    G_mix = deepcopy(G_high)
    G_mix.load_state_dict(state_dict)
    return G_mix

Inputs and targets for the pix2pix training are generated as follows

G_blend = blend_models(G_low, G_high, swap_layer=swap_layer, blend_width=blend_width)

input  = G_low.synthesis(w, noise_mode="const")
target = G_blend.synthesis(w, noise_mode="const")

The strength of the stylization depends on swap_layer and blend_width, so you can use multiple blended models to generate multiple target images (for example, strongly stylized target for the facial area and weakly stylized target for the background) and fuse them using segmentation masks.

G_blend_face = blend_models(G_low, G_high, swap_layer=swap_layer_face, blend_width=blend_width_face)
G_blend_bg = blend_models(G_low, G_high, swap_layer=swap_layer_bg, blend_width=blend_width_bg)

input  = G_low.synthesis(w, noise_mode="const")

target_face = G_blend_face.synthesis(w, noise_mode="const")
target_bg = G_blend_bg.synthesis(w, noise_mode="const")
target = target_face * mask + target_bg * (1 - mask)

Hope this helps!

Those weights are trained using a pix2pix method similar to this: https://github.com/justinpinkney/toonify

input(face) → Animegan2 generator → output <==> target(portrait), loss = LPIPS + L2 + GAN

great job and thanks for your sharing !

and I wonder some details:

  1. since your face dataset is produced through fine-tuning stylegan, so the best way to use pretrained model 'Face Portrait v1' may be 'align face in ffhq-mode'?
  2. what's input size for 'Face Portrait v1' when training, 1024 ?
  1. Yes, that's the reason for the ffhq-alignment in demo.ipynb
  2. It's 512
  1. Yes, that's the reason for the ffhq-alignment in demo.ipynb
  2. It's 512

got it!
thanks~

I used the same network blending method, but the implementation may differ. Below is my own implementation for the official stylegan2 model

from training.networks import Generator
from copy import deepcopy
import math


def gather_params(G: Generator) -> dict:
    params = dict(
        [(res, {}) for res in G.synthesis.block_resolutions] + [("mapping", {})]
    )
    # G params: mapping.xxx / synthesys.b128.xxx
    for n, p in sorted(list(G.named_buffers()) + list(G.named_parameters())):
        if n.startswith("mapping"):
            params["mapping"][n] = p
        else:
            res = int(n.split(".")[1][1:])
            params[res][n] = p
    return params


def blend_models(G_low: Generator, G_high: Generator, swap_layer: int, blend_width: float = 3) -> Generator:
    params_low = gather_params(G_low)
    params_high = gather_params(G_high)

    for layer_idx, res in enumerate(G_low.synthesis.block_resolutions):
        x = layer_idx - swap_layer
        
        if blend_width is not None:
            assert blend_width > 0
            exponent = - x / blend_width
            y = 1 / (1 + math.exp(exponent))
        else:
            y = 1 if x > 0 else 0
            
        for n, p in params_high[res].items():
            params_high[res][n] = params_high[res][n] * y + params_low[res][n] * (1 - y)

    state_dict = {}
    for _, p in params_high.items():
        state_dict.update(p)

    G_mix = deepcopy(G_high)
    G_mix.load_state_dict(state_dict)
    return G_mix

Inputs and targets for the pix2pix training are generated as follows

G_blend = blend_models(G_low, G_high, swap_layer=swap_layer, blend_width=blend_width)

input  = G_low.synthesis(w, noise_mode="const")
target = G_blend.synthesis(w, noise_mode="const")

The strength of the stylization depends on swap_layer and blend_width, so you can use multiple blended models to generate multiple target images (for example, strongly stylized target for the facial area and weakly stylized target for the background) and fuse them using segmentation masks.

G_blend_face = blend_models(G_low, G_high, swap_layer=swap_layer_face, blend_width=blend_width_face)
G_blend_bg = blend_models(G_low, G_high, swap_layer=swap_layer_bg, blend_width=blend_width_bg)

input  = G_low.synthesis(w, noise_mode="const")

target_face = G_blend_face.synthesis(w, noise_mode="const")
target_bg = G_blend_bg.synthesis(w, noise_mode="const")
target = target_face * mask + target_bg * (1 - mask)

Hope this helps!

thanks for your helpful suggestions !

Now I met another question..., how do you make sure the expression similarity between the generated input and target ? In my case, facial expression generated by the blended models is often changed compared to generated by the original ffhq model. Do you fix the mapping network parameters or some layers in generator when finetuning? if you did, would you like to share the details to me?

Thanks sincerely

@bryandlee I would be very grateful if I can get your reply ~

@Leocien That's a tricky part, but here are some tips that could help.

  1. Freezing does help. I freeze the mapping layer when finetuning for layer swapping models.
  2. Use attribute encoder to explicitly force the attribute similarity between the original and translated images. This could be done in the stylegan finetuning stage or pix2pix training stage.
  3. Use additional augmentations in pix2pix training stage to guide some low-level attribute preserving. For example, I apply multiple corruptions to the source image for robustness, but use the same color shift to both source and target images in order to keep the colors consistent.

@Leocien That's a tricky part, but here are some tips that could help.

  1. Freezing does help. I freeze the mapping layer when finetuning for layer swapping models.
  2. Use attribute encoder to explicitly force the attribute similarity between the original and translated images. This could be done in the stylegan finetuning stage or pix2pix training stage.
  3. Use additional augmentations in pix2pix training stage to guide some low-level attribute preserving. For example, I apply multiple corruptions to the source image for robustness, but use the same color shift to both source and target images in order to keep the colors consistent.

Hi, when you train your pix2pix model , you use stylegan random datasets, that is fake data, and the trained model can be used in real pictures when test, do I understand right?

Those weights are trained using a pix2pix method similar to this: https://github.com/justinpinkney/toonify

input(face) → Animegan2 generator → output <==> target(portrait), loss = LPIPS + L2 + GAN

Hi, @bryandlee! Thanks for your work!
I am trying to get qualitative results on my own synthesized dataset using your pipeline. I have a few questions about its second stage (training pix2pixHD model):

  1. Did you replace both generator and discriminator in pix2pixHD pipeline with those from AnimeGANv2?
  2. Did you use the discriminator feature matching loss (G_GAN_Feat)?
  3. Did you use the VGG feature matching loss (G_VGG)?
  4. What weights did you use for your losses?
  5. Did you make any modifications to the default pix2pixHD pipeline (except for adding 2 new losses - LPIPS and L2 and replacing G and D)?
  6. How large synthesized dataset did you use?