Suggestion: Using Target Prompt for Improved Real Image Editing Results

Question

Suggestion: Using Target Prompt for Improved Real Image Editing Results

phymhan opened this issue a year ago · comments

Hi there,

Thank you for the amazing work! I thoroughly enjoyed reading your paper. I have a suggestion for potentially improving real image editing results. I noticed that in some cases, using the target prompt for DDIM inversion seems to yield better editing results compared to using the source prompt (as shown in Figure 3). Here are two examples (input image):
Using source prompt:

Using target prompt:

Using source prompt:

Using target prompt:

I used the cmds from here. The car's pose seems better aligned with the original input image, for which I've also observed similar behavior in my experiments. I guess this share some similarities with the idea behind Imagic. While I'm not certain if this would be universally beneficial, I think it might be worth exploring further. Once again, thank you and congratulations on the fantastic work!

Mingdeng · Answer 1 · Thu Apr 27 2023 23:12:10 GMT+0800 (China Standard Time)

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:

python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar

and the results are:

The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

Ligong Han · Answer 2 · Fri Apr 28 2023 00:20:38 GMT+0800 (China Standard Time)

Hi @ljzycmd, thanks for your feedback and for conducting a quick test! Looking forward to seeing future developments in the awesome project!

lavenderrz · Answer 3 · Fri May 03 2024 04:44:15 GMT+0800 (China Standard Time)

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:
python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar
and the results are: The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

Do you mind sharing playground.py?

Mingdeng · Answer 4 · Fri May 03 2024 13:15:24 GMT+0800 (China Standard Time)

@lavenderrz, you can find playground.py here: https://github.com/phymhan/MasaCtrl