williamyang1991 / VToonify

[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why E and Es use two different models? because the PSP can also extract multi-scale features

seamoonlight-YBY opened this issue · comments

Excuse me, the ”Collection-Based Portrait Video Style Transfer “ vtoonify use two different encoder,This is a speed consideration or a performance advantage, because I think the structure of PAN in PSP also has multi-scale feature maps before the map2style, which is similar to the downsampling in E

We would like to use the good prior from the pretrained Es and G. pSp has been trained to extract valid style codes.
If we use it to also extract the content features, then we need train pSp, which may affect its performance on style extraction.
Also, we would like a unified framework for both collection and exemplar toonification. So we use a new E in both settings.

We are working on a new project where we use pSp as both E and Es. There is some improvement, but the training time will be longer.
Stay tuned.

Thanks a lot for your detailed answers!