更新版本到1.8.5之后训练压力明显增加

Question

更新版本到1.8.5之后训练压力明显增加

JackOwlfelino opened this issue 2 months ago · comments

参数、模型、训练集都与前几天（1.8.3）使用的一样，收敛速度和训练速度明显变慢了，显存占用也增加了不少。

model_train_type = "sdxl-lora"
pretrained_model_name_or_path = "D:/Program Files/Stable_Diffusion_LoRATrainer/sd-models/animagine-xl-3.0.safetensors"
vae = "D:/Program Files/Stable_Diffusion_LoRATrainer/sd-models/sdxl_vae.safetensors"
v2 = false
train_data_dir = "D:/Program Files/Stable_Diffusion_LoRATrainer/train/TEST"
prior_loss_weight = 1
resolution = "960,1440"
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 1920
bucket_reso_steps = 32
output_name = "TEST_Animagine_XL"
output_dir = "./output"
save_model_as = "safetensors"
save_precision = "fp16"
save_every_n_epochs = 16
max_train_epochs = 16
train_batch_size = 1
gradient_checkpointing = false
network_train_unet_only = false
network_train_text_encoder_only = false
learning_rate = 0.00003
unet_lr = 0.00003
text_encoder_lr = 0.000003
lr_scheduler = "cosine_with_restarts"
lr_warmup_steps = 0
lr_scheduler_num_cycles = 1
optimizer_type = "Lion8bit"
network_module = "lycoris.kohya"
network_weights = "D:/Program Files/Stable_Diffusion_LoRATrainer/sd-models/TEST-v1.1.safetensors"
network_dim = 32
network_alpha = 32
train_norm = false
log_with = "tensorboard"
logging_dir = "./logs"
caption_extension = ".txt"
shuffle_caption = true
keep_tokens = 6
max_token_length = 255
seed = 1337
mixed_precision = "fp16"
xformers = true
lowram = false
cache_latents = true
cache_latents_to_disk = true
persistent_data_loader_workers = true
gpu_ids = [ "0" ]
network_args = [ "conv_dim=4", "conv_alpha=1", "dropout=0", "algo=locon" ]

JackOwlfelino · Answer 1 · Mon Mar 11 2024 21:48:08 GMT+0800 (China Standard Time)

（补充一下，TEST-v1.1.safetensors是一个1.5的LoRA，以前在已有的1.5 LoRA上训练XL模型似乎没有那么多资源占用，更新之后收敛慢了太多了，是增加了什么新功能吗……）