0D1Lee0 / stylegan-nada

改用pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Key points of the architecture

StyleGan-NADA allows to adapt the domain of a StyleGan2 generator to a new domain. It does so by minimizing the directional clip loss:

directional_loss

where E_T and E_I are the text and image encoders that the CLIP model provides. G_train is the new generator that StyleGan-NADA produces while G_frozen is the original generator that is kept witohut training.

architecture

Conceptually, it calculates a direction in CLIP space using text prompts and shifts the generator in CLIP space accordingly to that direction.

domain_adaption

Not all layers of the G_frozen network are trained. A subset of layers is chosen based on how much they weight on the output. This is called adaptive layer freezing.

layer_freezing

For more details, the original paper is avaiable here

Run and train the network

To train and run the newly generated network, a public accessible colab is avaiable here. It allows to select a model to adapt, insert source and target domains, train the network and use it to generate an arbitrary number of images.

Experiments and comparision

Some details of the implementation where changed. Here we present some results and comparision with the original model.

Additional work

  • The adaptive layer freezing approach was made scalable. This means that instead of computing the best k layers to train at every iteration it's done only every auto_layer_interval. Also every auto_layer_falloff the number of trained layer decreases, allowing for better fine tuning.
  • Global loss was reintroduced. The loss is now comuted as a weighted sum between Directional and Global Clip Loss. This can be adjusted via a slider in the colab.
  • The original paper uses a set of prompts generated from templates starting from the insterted prompts. I Experimented without this feature and concluded there are no major changes. I removed the feature by default but it's still possible to use templates.

a photo of a dog -> a photo of a cute baby dog

~200 iterations

Starting: Original: Ours: Improved: x5 training speed and less artifacts

a photo of a dog -> a drawing of a dog

~200 iterations

Starting: Original: Ours: Improved: x4 speedup

a photo of a dog -> a photo of joker

~200 iterations

Starting: Original: Ours: Improved: x3 speedup

a photo of a church -> a church painted by van gogh

~300 iterations

Starting: Original: Ours: Improved: x3 speedup

a photo of a church -> a cubism painting of a church

~400 iterations

(Note less artifcts!)

Starting: Original: Ours: Improved: x5 speedup and keeps color

a photo of a person -> a drawing of a person

~200 iterations

(Note less artifcts!)

Starting: Original: Ours: Improved: 80% directional, 20% global

About

改用pytorch

License:Other


Languages

Language:Python 79.5%Language:Cuda 17.4%Language:C++ 3.1%