finetuning ldm decoder: noisy output

Question

finetuning ldm decoder: noisy output

rahimentezari opened this issue 4 months ago · comments

Hi
I wanted to finetune the ldm decoder and have two problems

there are some missing parameters in the finetune_ldm_decoder compared to https://justpaste.it/cse0x, for example lambda_mse": 0.5, "lambda_lpips": 1. Should we remove them from param list?
As training goes, even after 6K iterations, (you use only 100 iters right?), I get noisy outputs:
6000_train_d0

6000_train_orig

6000_train_w

Here are my configs:

python finetune_ldm_decoder.py --num_keys 1 \ --ldm_config configs/v2-inference.yaml \ --ldm_ckpt v2-1_512-ema-pruned.ckpt \ --msg_decoder_path dec_48b.pth \ --decoder_depth 8 \ --decoder_channels 64 \ --loss_i "watson-vgg" \ --loss_w "bce" \ --lambda_i 0.2 \ --lambda_w 1.0 \ --optimizer "AdamW,lr=5e-4" \ --train_dir coco2014/train2014 \ --val_dir coco2014/test2014 \ --steps 10000 \ --warmup_steps 100 \ --batch_size 16

If I want to change the decoder, to another one, can I still use the same hidden network trained? I give it a try with another decoder with z_channel=8 and I am getting noisy train_w images (purple images)
Train [ 760/1000] eta: 0:01:45 iteration: 750.000000 (380.000000) loss: 0.190190 (0.414728) loss_w: 0.051896 (0.189214) loss_i: 0.692773 (1.127573) psnr: 25.542080 (inf) bit_acc_avg: 1.000000 (0.927698) word_acc_avg: 1.000000 (0.459921) lr: 0.000076 (0.000321) time: 0.428403 data: 0.000091 max mem: 42627

Pierre Fernandez · Answer 1 · Fri Jun 07 2024 15:47:11 GMT+0800 (China Standard Time)

Hi,

yes
6000_train_d0 are images decoded from the original decoder (D_o) (this one is not changed during optim), so the issue is not in the fine-tuning I would say. Does it only happen after some fine-tuning steps? Can you try to to encode decode and see what the images look like?
Yes, you should be able to switch decoders as they are independent of the extractor (in the paper, we did fine-tune other decoders, like the one used for inpainting or for SR, which differ from the original one).

You can also share the full logs and code to reproduce.