Semantic Class-Adaptive Diffusion Model (SCA-DM)

Model architecture. The encoder part of the UNet uses only standard Resnet Block with SpatialTransformer to guide the diffusion process with the style embedding obtained from Es. The middle block and the decoder part use SPADEResBlock, as in SDM, to encapsulate the semantic mask info. The Mask attention mechanism is applied inside the SpatialTransformer on the Cross Attention Map.

[Towards Controllable Face Generation with Semantic Latent Diffusion Models]

Results

Our model can generate images in three ways: (a) Given a reference image, (b) Given a reference image but with a specific body part with a random style, (c) Fully noise based without any reference.

Interpolation of eyes, mouth, hair style and full style going from full target (left) to full reference (right). Some details are highlighted for a clear observation of changes.

Style transfer comparison between different methods and our model. The style of the reference image is applied to the target image. The overall consistency in style swap is far better compared to state-of-the-art methods.

How to use it

conda activate diffusion
python gradio_img2img.py --dataset CELEBA_HQ_TEST_FOLDER

Requirements

A suitable conda environment named diffusion can be created and activated with:

conda env create -f environment.yaml
conda activate diffusion

Checkpoints

To use gradio_img2img.py download the model from here and put it in the checkpoints folder and download the VQ-F4 (f=4, VQ (Z=8192, d=3), first row in the table) from the LDM repo following their instructions.

2af6-4e01-4580-b007-053496df5462.online-video-cutter.com.mp4

kimx3966 / LDM-Diffusion-sem

Semantic Class-Adaptive Diffusion Model (SCA-DM)

Results

How to use it

Requirements

Checkpoints

About

Languages