inference unetref generator

Question

inference unetref generator

concrete13377 opened this issue 7 months ago · comments

trying to run with unetref generator checkpoint trained with config python3 scripts/gen_single_image_diffusion.py \ --model-in-file latest_net_G_A.pth \ --img-in viton_bbox_ref/testA/imgs/00006_00.jpg \ --mask-in viton_bbox_ref/testA/ref/00006_00.jpg \ --dir-out checkpoints/viton_bbox_ref/inference_output \ --img-width 128 \ --img-height 128

getting the following error

  warnings.warn(
Dual U-Net: number of ref blocks:  15
sampling loop time step:   0%|                                                                                                                                                                               | 0/1000 [00:00<?, ?it/s]
  0%|                                                                                                                                                                                                           | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 808, in <module>
    frame, lmodel, lopt = generate(**vars(args))
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/scripts/gen_single_image_diffusion.py", line 563, in generate
    out_tensor, visu = model.restoration(
                       ^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 95, in restoration
    return self.restoration_ddpm(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 149, in restoration_ddpm
    y_t = self.p_sample(
          ^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 253, in p_sample
    model_mean, model_log_variance = self.p_mean_variance(
                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/diffusion_generator.py", line 219, in p_mean_variance
    noise=self.denoise_fn(
          ^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/palette_denoise_fn.py", line 109, in forward
    out = self.model(input, embedding, ref)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/venv_joli/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1605, in forward
    h, hs, emb, h_ref, hs_ref = self.compute_feats(
                                ^^^^^^^^^^^^^^^^^^^
  File "/joliGEN/models/modules/unet_generator_attn/unet_generator_attn.py", line 1595, in compute_feats
    h, _ = module(h, emb, qkv_ref=qkv_list.pop(0))
                                  ^^^^^^^^
UnboundLocalError: cannot access local variable 'qkv_list' where it is not associated with a value```

Emmanuel Benazera · Answer 1 · Thu Oct 19 2023 01:42:02 GMT+0800 (China Standard Time)

Hi @concrete13377 thanks for reporting this, I can reproduce it. There's a flag and input needed. I'll come back with a fix.

Emmanuel Benazera · Answer 2 · Thu Oct 19 2023 04:15:13 GMT+0800 (China Standard Time)

See #569

The PR allows you to generate image with reference input:

python3 gen_single_image_diffusion.py --model-in-file /path/to/model/latest_net_G_A.pth --img-in viton_bbox_ref/testA/imgs/00006_00.jpg --bbox-in viton_bbox_ref/testA/bbox/00006_00.txt --ref-in viton_bbox_ref/testA/ref/00006_00.jpg --dir-out /path/to/out/ --img-width 128 --img-height 128

You want to look at the result /path/to/out/img_0_generated_crop.png. (The img_0_generated.png image is incorrect in this case since the model from documentation is trained from 512x512 crops that contain the garment bbox, so the model never sees heads, etc...).

Roman Bogachev · Answer 3 · Thu Oct 19 2023 06:51:09 GMT+0800 (China Standard Time)

thank you so much for your work

Roman Bogachev · Answer 4 · Fri Oct 20 2023 05:58:13 GMT+0800 (China Standard Time)

what do you mean by "the model never sees heads, etc"? how can I train the model to get correct generated image?

Emmanuel Benazera · Answer 5 · Fri Oct 20 2023 15:18:36 GMT+0800 (China Standard Time)

Adding --data_online_creation_load_size_A 768 1024 would load the image full size at training time.

For a typical output during training:

Images are squared, but you can resize them afterwards easily.

Roman Bogachev · Answer 6 · Fri Oct 20 2023 16:05:27 GMT+0800 (China Standard Time)

so there's no way to use a model trained with example config since it's wrong about resolution? or can I just run it with other options so that it generates correct images?

Emmanuel Benazera · Answer 7 · Fri Oct 20 2023 16:32:19 GMT+0800 (China Standard Time)

The example model lacks the full context. You can try to hack a crop at inference, but I don´t see how this would help much.

However, you can finetune your existing model with the --data_online_creation_load_size_A 768 1024 and --train_continueoptions. This would prevent from retraining from scratch.