TencentARC / MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing

Home Page:https://ljzycmd.github.io/projects/MasaCtrl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suggestion: Using Target Prompt for Improved Real Image Editing Results

phymhan opened this issue · comments

Hi there,

Thank you for the amazing work! I thoroughly enjoyed reading your paper. I have a suggestion for potentially improving real image editing results. I noticed that in some cases, using the target prompt for DDIM inversion seems to yield better editing results compared to using the source prompt (as shown in Figure 3). Here are two examples (input image):
Using source prompt:
all_step4_layer10 (3)

Using target prompt:
all_step4_layer10 (2)

Using source prompt:
all_step4_layer10 (1)

Using target prompt:
all_step4_layer10

I used the cmds from here. The car's pose seems better aligned with the original input image, for which I've also observed similar behavior in my experiments. I guess this share some similarities with the idea behind Imagic. While I'm not certain if this would be universally beneficial, I think it might be worth exploring further. Once again, thank you and congratulations on the fantastic work!

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:

python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar

and the results are:
all_step4_layer10
The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

Hi @ljzycmd, thanks for your feedback and for conducting a quick test! Looking forward to seeing future developments in the awesome project!

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:

python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar

and the results are: all_step4_layer10 The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

Do you mind sharing playground.py?

@lavenderrz, you can find playground.py here: https://github.com/phymhan/MasaCtrl