Kohya Trainer

Github Repository for kohya-ss/sd-scripts colab notebook implementation

Notebook Name	Description	Link	Old Commit
Kohya LoRA Dreambooth	LoRA Training (Dreambooth method)
Kohya LoRA Fine-Tuning	LoRA Training (Fine-tune method)
Kohya Trainer	Native Training
Kohya Dreambooth	Dreambooth Training
Fast Kohya Trainer `NEW`	Easy 1-click LoRA & Native Training
Cagliostro Colab UI `NEW`	A Customizable Stable Diffusion Web UI

Updates

2023

v14.1 (09/03):

What Changes?

Fix xformers version for all notebook to adapt Python 3.9.16
Added new network_module : lycoris.kohya. Read KohakuBlueleaf/LyCORIS
- Previously LoCon, now it's called LyCORIS, a Home for custom network module for kohya-ss/sd-scripts.
- Algo List as of now:
  - lora: Conventional Methods a.k.a LoCon
  - loha: Hadamard product representation introduced by FedPara
- For backward compatibility, locon.locon_kohya still exist, but you can train LoCon in the new lycoris.kohya module as well by specify ["algo=lora"] in the network_args
Added new condition to enable or disable generating sample every n epochs/steps, by disabling it, sample_every_n_type_value automatically set to int(999999)

v14 (07/03):

What Changes?

Refactoring (again)
- Moved support us button to separated and hidden section
- Added old commit link to all notebook
- Deleted clone sd-scripts option because too risky, small changes my break notebook if new updates contain syntax from python > 3.9
- Added wd-1.5-beta-2 and wd-1.5-beta-2-aesthetic as pretrained model for SDv2.1 768v model, please use --v2 and --v_parameterization if you wanna train with it.
- Removed folder naming scheme cell for colab dreambooth method, thanks to everyone who made this changes possible. Now you can set train_data_dir from gdrive path without worrying <repeats>_<token> class> ever again
Revamped V. Training Model section
- Now it has 6 major cell
  1. Model Config:
    - Specify pretrained model path, vae to use, your project name, outputh path and if you wanna train on v2 and or v_parameterization here.
  2. Dataset Config:
    - This cell will create dataset_config.toml file based on your input. And that .toml file will be used for training.
    - You can set class_token and num_repeats here instead of renaming your folder like before.
    - Limitation: even though --dataset_config is powerful, but I'm making the workflow to only fit one train_data_dir and reg_data_dir, so probably it's not compatible to train on multiple concepts anymore. But you can always tweaks .toml file.
    - For advanced users, please don't use markdown but instead tweak the python dictionaries yourself, click show code and you can add or remove variable, dataset, or dataset.subset from dict, especially if you want to train on multiple concepts.
  3. Sample Prompt Config
    - This cell will create sample_prompt.txt file based on your input. And that .txt file will be used for generating sample.
    - Specify sample_every_n_type if you want to generate sample every n epochs or every n steps.
    - The prompt weighting such as ( ) and [ ] are not working.
    - Limitation: Only support 1 line of prompt at a time
    - For advanced users, you can tweak sample_prompt.txt and add another prompt based on arguments below.
    - Supported prompt arguments:
      - --n : Negative Prompt
      - --w : Width
      - --h : Height
      - --d : Seed, set to -1 for using random seed
      - --l : CFG Scale
      - --s : Sampler steps
  4. Optimizer Config (LoRA and Optimizer Config)
    - Additional Networks Config:
      - Added support for LoRA in Convolutional Network a.k.a KohakuBlueleaf/LoCon training, please specify locon.locon_kohya in network_module
      - Revamped network_args, now you can specify more than 2 custom args, but you need to specify it inside a list, e.g. ["conv_dim=64","conv_alpha=32"]
      - network_args for LoCon training as follow: "conv_dim=RANK_FOR_CONV" "conv_alpha=ALPHA_FOR_CONV" "dropout=DROPOUT_RATE"
      - Remember conv_dim + network_dim, so if you specify both at 128, you probably will get 300mb filesize LoRA
      - Now you can specify if you want to train on both UNet and Text Encoder or just wanna train one of them.
    - Optimizer Config
      - Similar to network_args, now you can specify more than 2 custom args, but you need to specify it inside a list, e.g. for DAdaptation : ["decouple=true","weight_decay=0.6"]
      - Deleted lr_scheduler_args and added lr_scheduler_num_cycles and lr_scheduler_power back
      - Added Adafactor for lr_scheduler
  5. Training Config
    - This cell will create config_file.toml file based on your input. And that .toml file will be used for training.
    - Added num_epochs back to LoRA notebook and max_train_steps to dreambooth and native training
    - For advanced users, you can tweak training config without re-run specific training cell by editing config_file.toml
  6. Start Training
    - Set config path to start training.
      - sample_prompt.txt
      - config_file.toml
      - dataset_config.toml
    - You can also import training config from other source, but make sure you change all important variable such as what model and what vae did you use
Revamped VI. Testing section
- Deleted all wrong indentation
- Added Portable Web UI as an alternative to try your trained model and LoRA, make sure you still got more time.
Added new changes to upload config_file to huggingface.

Useful Links

Official repository : kohya-ss/sd-scripts
Gradio Web UI Implementation : bmaltais/kohya_ss
Automatic1111 Web UI extensions : dPn08/kohya-sd-scripts-webui

Overview

Fine tuning of Stable Diffusion's U-Net using Diffusers
Addressing improvements from the NovelAI article, such as using the output of the penultimate layer of CLIP (Text Encoder) instead of the last layer and learning at non-square resolutions with aspect ratio bucketing.
Extends token length from 75 to 225 and offers automatic caption and automatic tagging with BLIP, DeepDanbooru, and WD14Tagger
Supports hypernetwork learning and is compatible with Stable Diffusion v2.0 (base and 768/v)
By default, does not train Text Encoder for fine tuning of the entire model, but option to train Text Encoder is available.
Ability to make learning even more flexible than with DreamBooth by preparing a certain number of images (several hundred or more seems to be desirable).

Original post for each dedicated script:

Change Logs:

2023

v13 (25/02):

What Changes?

Of course refactoring, cleaning and make the code and cells more readable and easy to maintain.
- Moved Login to Huggingface Hub to Deployment section, in the same cell with defining repo.
- Merged Install Kohya Trainer, Install Dependencies, and Mount Drive cells
- Merged Dataset Cleaning and Convert RGB to RGBA cells
- Deleted Image Upscaler cell, because bucketing automatically upscale your dataset (converted to image latents) to min_bucket_reso value.
- Deleted Colab Ram Patch because now you can set --lowram in the training script.
- Revamped Unzip dataset cell to make it look simpler
Added xformers pre-compiled wheel for A100
Revamped Pretrained Model section
- Deleted some old pretrained model
- Added Anything V3.3, Chilloutmix, and Counterfeit V2.5 as new pretrained model for SD V1.x based model
- Added Replicant V1.0, WD 1.5 Beta and Illuminati Diffusion V1 as new pretrained model for SD V2.x 768v based model
- Changed Stable Diffusion 1.5 pretrained model to pruned one.
Changed Natural Language Captioning back from GIT to BLIP with beam_search enabled by default
Revamped Image Scraper from simple to advanced, added new feature such as:
- Added safebooru to booru list
- Added custom_url option, so you can copy and paste the url instead of specify which booru sites and tags to scrape
- Added user_agent field, because you can't access some image board with default user_agent
- Added limit_rate field to limit your count
- [Experimental] Added with_aria2c to scrape your dataset, not a wrapper, just a simple trick to extract urls with gallery-dl and download them with aria2c instead. Fast but seems igonoring --write-tags.
- All downloaded tags now saved with .txt format instead of .jpg.txt
- Added additional_arguments to make it more flexible if you want to try other args
Revamped Append Custom Tag cell
- Create new caption file for every image file based on extension provided (.txt/.caption) if you didn't want to use BLIP or WD Tagger
- Added --keep_tokens args to the cell
Revamped Training Model section.
- Revamped prettytable for easier maintenance and bug fixing
- Now it has 4 major cell:
  - Folder Config
    - To specify v2, v2_parameterization and all important folder and project_name
  - LoRA and Optimizer Config
    - Only Optimizer Config for notebook outside LoRA training
    - All about Optimizer, learning_rate and lr_scheduler goes here
    - Added new Optimizer from latest kohya-ss/sd-script, all available optimizer : `"AdamW", "AdamW8bit", "Lion", "SGDNesterov", "SGDNesterov8bit", "DAdaptation", "AdaFactor"
    - Currently you can't use DAdaptation if you're in Colab free tier because it need more VRAM
    - Added --optimizer_args for custom args, useful if you want to try adjusting weight decay, betas etc
  - Dataset Config
    - Only available for Dreambooth method notebook, it basically bucketing cell for Dreambooth.
    - Added caption dropout, you can drop your caption or tags by adjusting dropout rates.
    - Added --bucket_reso_steps and --bucket_no_upscale
  - Training Config
    - Added --noise_offset, read Diffusion With Offset Noise
    - Added --lowram to load the model in VRAM instead of CPU
Revamped Convert Diffusers to Checkpoint cell, now it's more readable.
Fixing bugs when output_dir located in google drive, it assert an error because of something like /content/drive/dreambooth_cmd.yaml which is forbidden, now instead of saved to {output_dir}, now training args history are saved to {training_dir}

News

I'm in burnout phase, so I'm sorry for the lame update.
Fast Kohya Trainer, an idea to merge all Kohya's training script into one cell. Please check it here.
- Please don't expect high, it just a secondary project and maintaining 1-click cell is hard. So I won't prioritized it.
Kohya Textual Inversion are cancelled for now, because maintaining 4 Colab Notebook already making me this tired.
- Please use this instead, not kohya script but everyone on WD server using this since last year:
  - stable-textual-inversion-cafe colab
  - stable-textual-inversion-cafe Colab - Lazy Edition
I wrote a Colab Notebook for #AUTOMATIC1111's #stablediffusion Web UI, with built-in Mikubill's #ControlNet extension. All Annotator and extracted ControlNet model are provided in the notebook. It's called Cagliostro Colab UI. Please try it.
- You can use new UI/UX from Anapnoe in the notebook. You can find the option in experimental section.

Training script changes:

Please read kohya-ss/sd-scripts for recent updates.

v12 (05/02):

What Changes?

Refactored the 4 notebooks (again)
Restored the --learning_rate function in kohya-LoRA-dreambooth.ipynb and kohya-LoRA-finetuner.ipynb #52
Fixed the cell for inputting custom tags #48 and added the --keep_tokens function to prevent custom tags from being shuffled.
Added a cell to check if all LoRA modules have been trained properly.
Added descriptions for each notebook and links to the relevant notebooks to prevent "training on the wrong notebook" from happening again.
Added a cell to check the metadata in the LoRA model.
Added a cell to change the transparent background in the train data.
Added a cell to upscale the train data using R-ESRGAN
Divided the Data Annotation section into two cells:
- Removed BLIP and replaced it with Microsoft/GIT as the auto-captioning for natural language (git-large-textcaps is the default model).
- Updated the Waifu Diffusion 1.4 Tagger to version v2 (SwinV2 is the default model).
  - The user can adjust the threshold for general tags. It is recommended to set the threshold higher (e.g. 0.85) if you are training on objects or characters, and lower the threshold (e.g. 0.35) for training on general, style, or environment.
  - The user can choose from three available models.
Added a field for uploading to the Huggingface organization account.
Added the --min_bucket_reso=320 and --max_bucket_reso=1280 functions for training resolutions above 512 (e.g. 640 and 768), Thanks Trauter!

Training script Changes(kohya_ss)

Please read Updates 3 Feb. 2023, 2023/2/3 for recent updates.

v11.5 (31/01):

What Changes?

Refactored the 4 notebooks, removing unhelpful comments and making some code more efficient.
Removed the download and generate regularization images function from kohya-dreambooth.ipynb and kohya-LoRA-dreambooth.ipynb.
Simplified cells to create the train_folder_directory and reg_folder_directory folders in kohya-dreambooth.ipynb and kohya-LoRA-dreambooth.ipynb.
Improved the download link function from outside huggingface using aria2c.
Set Anything V3.1 which has been improved CLIP and VAE models as the default pretrained model.
Fixed the parameter table and created the remaining tables for the dreambooth notebooks.
Added network_alpha as a supporting hyperparameter for network_dim in the LoRA notebook.
Added the lr_scheduler_num_cycles function for cosine_with_restarts and the lr_scheduler_power function for polynomial.
Removed the global syntax --learning_rate in each LoRA notebook because unet_lr and text_encoder_lr are already available.
Fixed the upload to hf_hub cell function.

Training script Changes(kohya_ss)

Please read release version 0.4.0 for recent updates.

v11 (19/01):

Reformat notebook,
- Added %store IPython magic command to store important variable
- Now you can change the active directory only by editing directory path in 1.1. Clone Kohya Trainer cell, and save it using %store magic command.
- Deleted unzip cell and adjust download zip cell to do auto unzip as well if it detect path startswith /content/
- Added --flip_aug to Buckets and Latents cell.
- Added --output_name (your-project) cell to save Trained Model with custom nam.
- Added ability to auto compress train_data_dir, last-state and training_logs before upload them to Huggingface
Added colab_ram_patch as temporary fix for newest version of Colab after Ubuntu update to load Stable Diffusion model in GPU instead of RAM

Training script Changes(kohya_ss)

Please read release version 0.3.0 for recent updates.

v10 (02/01) separate release

Added a function to automatically download the BLIP weight in make_caption.py
Added functions for LoRA training and generation
Fixed issue where text encoder training was not stopped
Fixed conversion error for v1 Diffusers->ckpt in convert_diffusers20_original_sd.py
Fixed npz file name for images with dots in prepare_buckets_latents.py

Colab UI changes:

Integrated the repository's format with kohya-ss/sd-script to facilitate merging
You can no longer choose older script versions in the clone cell because the new format does not support it
The requirements for both blip and wd tagger have been merged into one requirements.txt file
The blip cell has been simplified because make_caption.py will now automatically download the BLIP weight, as will the wd tagger
A list of sdv2 models has been added to the "download pretrained model" cell
The "v2" option has been added to the bucketing and training cells
An image generation cell using gen_img_diffusers.py has been added below the training cell

2022

v9 (17/12):

Added the save_model_as option to fine_tune.py, which allows you to save the model in any format.
Added the keep_tokens option to fine_tune.py, which allows you to fix the first n tokens of the caption and not shuffle them.
Added support for left-right flipping augmentation in prepare_buckets_latents.py and fine_tune.py with the flip_aug option.

v8 (13/12):

Added support for training with fp16 gradients (experimental feature). This allows training with 8GB VRAM on SD1.x. See "Training with fp16 gradients (experimental feature)" for details.
Updated WD14Tagger script to automatically download weights.

v7 (7/12):

Requires Diffusers 0.10.2 (0.10.0 or later will work, but there are reported issues with 0.10.0 so we recommend using 0.10.2). To update, run pip install -U diffusers[torch]==0.10.2 in your virtual environment.
Added support for Diffusers 0.10 (uses code in Diffusers for v-parameterization training and also supports safetensors).
Added support for accelerate 0.15.0.
Added support for multiple teacher data folders. For caption and tag preprocessing, use the --full_path option. The arguments for the cleaning script have also changed, see "Caption and Tag Preprocessing" for details.

v6 (6/12):

Temporary fix for an error when saving in the .safetensors format with some models. If you experienced this error with v5, please try v6.

v5 (5/12):

Added support for the .safetensors format. Install safetensors with pip install safetensors and specify the use_safetensors option when saving.
Added the log_prefix option.
The cleaning script can now be used even when one of the captions or tags is missing.

v4 (14/12):

The script name has changed to fine_tune.py.
Added the option --train_text_encoder to train the Text Encoder.
Added the option --save_precision to specify the data format of the saved checkpoint. Can be selected from float, fp16, or bf16.
Added the option --save_state to save the training state, including the optimizer. Can be resumed with the --resume option.

v3 (29/11):

Requires Diffusers 0.9.0. To update it, run pip install -U diffusers[torch]==0.9.0.
Supports Stable Diffusion v2.0. Use the --v2 option when training (and when pre-acquiring latents). If you are using 768-v-ema.ckpt or stable-diffusion-2 instead of stable-diffusion-v2-base, also use the --v_parameterization option when training.
Added options to specify the minimum and maximum resolutions of the bucket when pre-acquiring latents.
Modified the loss calculation formula.
Added options for the learning rate scheduler.
Added support for downloading Diffusers models directly from Hugging Face and for saving during training.
The cleaning script can now be used even when only one of the captions or tags is missing.
Added options for the learning rate scheduler.

v2 (23/11):

Implemented Waifu Diffusion 1.4 Tagger for alternative DeepDanbooru for auto-tagging
Added a tagging script using WD14Tagger.
Fixed a bug that caused data to be shuffled twice.
Corrected spelling mistakes in the options for each script.

Conclusion

While Stable Diffusion fine tuning is typically based on CompVis, using Diffusers as a base allows for efficient and fast fine tuning with less memory usage. We have also added support for the features proposed by Novel AI, so we hope this article will be useful for those who want to fine tune their models.

— kohya_ss

Credit

Kohya | Lopho for prune script | Just for my part

normalclone / kohya-trainer

Kohya Trainer

Github Repository for kohya-ss/sd-scripts colab notebook implementation

Updates

2023

v14.1 (09/03):

v14 (07/03):

Useful Links

Overview

Original post for each dedicated script:

Change Logs:

2023

v13 (25/02):

v12 (05/02):

v11.5 (31/01):

v11 (19/01):

v10 (02/01) separate release

2022

v9 (17/12):

v8 (13/12):

v7 (7/12):

v6 (6/12):

v5 (5/12):

v4 (14/12):

v3 (29/11):

v2 (23/11):

Conclusion

Credit

About

Languages