jiwoogit / StyleID

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

imgs

Usage

To run our code, please follow these steps:

  1. Setup
  2. Run StyleID
  3. Evaluation

It may require a single GPU with more than 20GB of memory. I tested the code in the pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel Docker image.

** You can also refer to "diffusers_implementation/" for StyleID implementation based on diffusers library. **

Setup

Our codebase is built on (CompVis/stable-diffusion and MichalGeyer/plug-and-play) and has similar dependencies and model architecture.

Create a Conda Environment

conda env create -f environment.yaml
conda activate StyleID

Download StableDiffusion Weights

Download the StableDiffusion weights from the CompVis organization at Hugging Face (download the sd-v1-4.ckpt file), and link them:

ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

Run StyleID

For running StyleID, run:

python run_styleid.py --cnt <content_img_dir> --sty <style_img_dir>

For running default configuration in sample image files, run:

python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.75 --T 1.5  # default
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.3 --T 1.5   # high style fidelity

To fine-tune the parameters, you have control over the following aspects in the style transfer:

  • Attention-based style injection is removed by the --without_attn_injection parameter.
  • Query preservation is controlled by the --gamma parameter. (A higher value enhances content fidelity but may result a lack of style fidelity).
  • Attention temperature scaling is controlled through the --T parameter.
  • Initial latent AdaIN is removed by the --without_init_adain parameter.

Save Precomputed Inversion Features

By default, it generates a "precomputed_feats" directory and saves the DDIM inversion feature of each input image. This reduces the time for two DDIM inversions but requires a significant amount of storage (over 3 GB for each image). If you encounter "no space left" error, please set the "precomputed" parameter as follows:

python run_styleid.py --precomputed "" # not save DDIM inversion features

Evaluation

For a quantitative evaluation, we incorporate a set of randomly selected inputs from MS-COCO and WikiArt in "./data" directory.

Before executing evalution code, please duplicate the content and style images to match the number of stylized images first. (40 styles, 20 contents -> 800 style images, 800 content images)

run:

python util/copy_inputs.py --cnt data/cnt --sty data/sty

We largely employ matthias-wright/art-fid and mahmoudnafifi/HistoGAN for our evaluation.

Art-fid

run:

cd evaluation;
python eval_artfid.py --sty ../data/sty_eval --cnt ../data/cnt_eval --tar ../output

Histogram loss

run:

cd evaluation;
python eval_histogan.py --sty ../data/sty_eval --tar ../output

Also, we additionally provide the style and content images for qualitative comparsion, in "./data_vis" directory.

Citation

If you find our work useful, please consider citing and star:

@InProceedings{Chung_2024_CVPR,
    author    = {Chung, Jiwoo and Hyun, Sangeek and Heo, Jae-Pil},
    title     = {Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {8795-8805}
}

About

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer


Languages

Language:Python 100.0%Language:Shell 0.0%