DO NOT USE

The result looks distorted despite my effect to ensure the inputs to the unet are the same as diffusers reference code. I am still investigating the cause.

SDXL-finetune

Adapted from https://github.com/harubaru/waifu-diffusion/blob/main/trainer/diffusers_trainer.py With the following additions:

SDXL support
FP32 for VAE
Designed for booru tags

How to use

Have a dataset of images with caption files end with .txt, e.g. danbooru2021/0000/1000.jpg and danbooru2021/0000/1000.txt
The content of txt file is something like

bad aesthetic,gen:panties,gen:oekaki,char:amano_misao_(battle_programmer_shirase),art:haganemaru_kennosuke,meta:lowres,gen:open_mouth,gen:panty_pull,gen:white_panties,gen:school_uniform,gen:1girl,copy:battle_programmer_shirase,gen:underwear,gen:blush,gen:jaggy_line,gen:long_hair,gen:solo

comma seperated tags, <category>:<tag>, category can be shorted to save space. things like aesthetic are taken from waifu diffusion

Stats

44GB of VRAM used for batch size of 2 at bucket resolution 896x896

About

finetune script for SDXL adapted from waifu-diffusion trainer

Languages

Language:Python 98.2%Language:Shell 1.8%