lyndonzheng / TFill

[CVPR 2022]: Bridging Global Context Interactions for High-Fidelity Image Completion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the refined results are worse than coarse results?

cats-food opened this issue · comments

Hi, Thank you very much for your great work!

I am testing your pretrained model on the example images you provided, but with the resolution of 256x256 (the original size is 512 but i want to test on 256). Here is my args:

python test.py --name celeba --img_file ./examples/celeba/img/ --mask_file ./examples/celeba/mask/ --results_dir ./results --model tc --coarse_or_refine refine --gpu_id 0 --no_shuffle --batch_size 1 --preprocess scale_shortside --mask_type 3 --load_size 256 --fine_size 256 --attn_G --add_noise

And here is the output results (all the images below are 256x256):
image

These are just 2 examples. I didnt modify any part of the code, I wonder why the refined results are worse than coarse results? Did I use the wrong args? I would be grateful if you can reply to my question, thanks in advance :)

Hi @Yang-Shiyuan Thanks for pointing out this issue. I have not tested the model on 256256 resolution. As our refined model is trained on 512512, the attention-aware layer is tamed on high-resolution features, which do not feed with 256256 images. Therefore, the refined model may perform worse on that resolution. You can directly evaluate the model on 512512 and then downsample the refined results to 256*256 resolution.

@lyndonzheng Thank you for your swift reply!Yes I have also noticed the implementation details in your paper, the refined model is trained on 512x512 images, so working on 512-size should yield much more resonable results. Thank you again for your suggestions :)