kousw/visual-style-prompting

About

This repository is an experimental implementation of Visual Style Prompting. This is an unofficial implementation.

Note

This method seems to be able to extract and reflect the style of an image by swapping the key and value of the self-attention with the key and value of the reference image after the 24th layer of UpBlock in Unet. From my experiments, it seems to be able to reflect some styles, but it may reflect excessive color schemes or broken details. It is possible that I may have skipped over something in the paper's implementation, so this is just an experimental implementation.

Reference

Generated

a cat sitting in a city

Environment

Python 3.10.9 CUDA 12.2

Installation

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Inference (with SDXL)

Command

python inference_sdxl.py --guidance_scale 7.0 --num_inference_steps 50 --reference_image sample/ref2.png --prompt "low-poly stile cat, low-poly game art, polygon mesh, jagged blocky, wireframe edges, cnetered composition, simple"  --resolution 768 --num_samples 5