lora inference gets dismatch size

Question

lora inference gets dismatch size

MyLifeisGettingBetter opened this issue 3 months ago · comments

I trained a lora on single RTX 4090 (24GB) with config setting below:

experiment_id: lora_liang_V1
checkpoint_path: /data/py_project/StableCascade/models
output_path: /data/py_project/StableCascade/models
model_version: 1B

WandB

wandb_project: StableCascade
wandb_entity: wandb_username

TRAINING PARAMS

lr: 1.0e-4
batch_size: 4
image_size: 768
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
grad_accum_steps: 4
updates: 10000
backup_every: 1000
save_every: 100
warmup_updates: 1

use_fsdp: True -> FSDP doesn't work at the moment for LoRA

use_fsdp: False

GDF

adaptive_loss_weight: True

LoRA specific

module_filters: ['.attn']
rank: 4
train_tokens:

- ['^snail', null] # token starts with "snail" -> "snail" & "snails", don't need to be reinitialized

['[liang]', '^girl'] # custom token [snail], initialize as avg of snail & snails

ema_start_iters: 5000

ema_iters: 100

ema_beta: 0.9

webdataset_path: file:/data/py_project/StableCascade/liang.tar

effnet_checkpoint_path: models/effnet_encoder.safetensors
previewer_checkpoint_path: models/previewer.safetensors
generator_checkpoint_path: models/stage_c_lite_bf16.safetensors

I got lora directory below

I've changed model_version to 1B in configs/inference/lora_c_3b.yaml, I got error like

zhangtao · Answer 1 · Tue Feb 20 2024 20:51:02 GMT+0800 (China Standard Time)

problem solved!!!

lora inference gets dismatch size

I trained a lora on single RTX 4090 (24GB) with config setting below:

WandB

TRAINING PARAMS

use_fsdp: True -> FSDP doesn't work at the moment for LoRA

GDF

adaptive_loss_weight: True

LoRA specific

- ['^snail', null] # token starts with "snail" -> "snail" & "snails", don't need to be reinitialized

ema_start_iters: 5000

ema_iters: 100

ema_beta: 0.9

effnet_checkpoint_path: models/effnet_encoder.safetensors previewer_checkpoint_path: models/previewer.safetensors generator_checkpoint_path: models/stage_c_lite_bf16.safetensors

I got lora directory below

effnet_checkpoint_path: models/effnet_encoder.safetensors
previewer_checkpoint_path: models/previewer.safetensors
generator_checkpoint_path: models/stage_c_lite_bf16.safetensors