AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request]: Weight for the negative prompt

art926 opened this issue · comments

commented

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Need to control the strength of the negative prompt. Just simply wrapping it into (myprompt:0.1) doesn't always allow to achieve the results. Even with the zero weight provided to the (..:x) construction, the prompt still affects the image too much.

Proposed workflow

Maybe a simpler slider or multiplier in the UI that could reduce/increase the weight for the whole negative prompt? Or :0.5 syntax support for the whole line (as for the normal prompt).

Additional information

No response

i dont think you understand what prompts are and how they are interpreted. this is not an interface thing, that you are asking for.

commented

Of course it would need adjustments to the code. If it was simple, I'd do it myself)

Using (negprompt:0.0) won't work because it only affects attention. What you're looking for, I think, is to interpolate between the 'uncond' created from negative prompt, and one created from an empty prompt and use that for guidance. I think an extension should probably be your best hope though. I'm kinda tempted to go and try that myself. In the mean time you can get a similar effect using the AND operator with a negative weight, but it will make your inference slower.

A weight less than one will reduce the effect of the prompt inside the parentheses (be that in the positive or negative prompt). If you want to increase the weight of a negative prompt just put what you want (or the whole prompt) with a weight greater than one.

Maybe try using that |

When stable diffusion encounter a bar like that it alternate between the prompt before and after the bar for each sample...

Example: I asked for a beachball with negative (blue| | | ) Then it removed most of the blue stripe then still let some blue... In comparison negative (blue:0.1) is removing all the blue. With this prompt it avoid all the blue 1 sample out of 4 but there is still 3 sample that are allowed to generate blue.

Then I tried the -negative number in negative prompt and it work too.. Like a beachball with (blue:-1.5) in negative prompt. Used like that it remove all the blue up to -1 then at -1.XX it start adding some blue back. Interesting... It is not the behaviors I was expecting but it is going to be useful.

This could be built in the interface to control the strength of the negative without change to stable diffusion code. There could be an input that just make it pass down a quantity of | | | ... But this is limited by the amount of sample. It is probably better to do it by hand. I guess.

You can do more advanced stuff with this, like splitting entirely different prompt with | then get particularly interesting result. Or trigger really subtle change too. Like (small nose|perfect nose|cute nose|big nose|tiny nose) Like add one additional word to then get closer and closer to the goal. Some word with inverted meaning can help to partially cancel a previous prompt. Like adding a big nose will cancel small nose which can help get exactly the expected size.

Hope this finally stops people from saying reducing emphasis does the same thing.

Comparison

@art926, looks like you can do it right now using the prompt blending script on this page:
https://github.com/amotile/stable-diffusion-backend/tree/master/src/process/implementations/automatic1111_scripts

It works with TI embeddings (which IMO is where it's most handy), but I think it probably breaks "prompt editing" and "alternating words".
(edit: yes it does)

OK, I made an extension that does only this, in a nice and simple way, so it should work fine with everything else. It also writes the value to PNGinfo and supports xyz plot. So it shouldn't really be necessary to have this as a core feature of the main program. But I still think we should, cuz everyone's not gonna go and get an extension, but everyone can really benefit from this.

You can get it here while I write some readme:
https://github.com/muerrilla/stable-diffusion-NPW

Meanwhile, here are some more examples comparing this method (top row) with attention/emphasis. i.e. bottom row is with the (Negative Prompt: 0), (Negative Prompt: 0.25), etc. syntax.

Base:
a close up portrait of a cyberpunk knight-0

Prompts: a close up portrait of a cyberpunk [knight|lobster], [lobster| ] armour, cyberpunk!, fantasy, elegant, digital painting, artstation, concept art, matte, sharp focus, art by josan gonzalez
Params: Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 10, Seed: 6, Size: 512x640, Model: deliberate_v2

Negative Prompt: red
a close up portrait of a cyberpunk knight-2-red

Negative Prompt: samurai pink cg
a close up portrait of a cyberpunk knight-25-samurai pink cg

Negative Prompt: custom TI embedding
a close up portrait of a cyberpunk knight-42

commented

This looks great, thank you! That's exactly the behavior I wanted to see in the app, would be nice to have it in the main branch. Btw, I also tried the "AND" trick (like, ....... AND red:-0.5), and it works just fine, but slower.

A little bit off topic, but since you're so familiar with the code... I'd like to try to change the script a little bit in the way, that it would use different CFG values on different steps, something like the Dynamic Thresholding (CFG Scale Fix) does, but with my own approach, and instead of messing with the extension, I'd like just to try changing the original code a bit and see how it behaves. What function should I edit for that? I'm especially interesting in modifying the CFG for steps of SDE sampler (I'm seeing that sampler producing amazing intermediate results on the first few steps with combination of the highest CFG (~30), but then either gets burned colors quickly, when the Dynamic Thresholding extension is not used, or just falls apart completely, when the extension is used).
Thank you!

@art926 No problem, mate! I always wanted this feature but was too lazy to do it myself, you gave me the incentive. 😄

As for your question, although high CFG scales are indeed where the fun is at, I haven't played around with manipulating the CFG scale, so don't know really. What I have done is playing around with the latent, and I can tell you this: I believe Dynamic Thresholding and CFG Scale scheduling (they are two different things IIRC) are red herrings! I have discovered yet another "naiive" method which yields far superior results in that department, but for that you'll have to wait for the main extension I'm working on. Coming very soon™ (hopefully before this whole technology expires in a few weeks).

Teaser:

Before architectural portrait of cyberpunk pirate trooper by shaun-122 architectural portrait of cyberpunk pirate trooper by shaun-122 After (quick test without enough tweaking)
Steps: 20, Sampler: DPM++ SDE Karras, **CFG scale: 30**, Seed: 1795838409, Size: 512x640, Model: deliberate_v2
commented

Oh my! Can you tell more about your approach and show some comparisons? I definitely don't want to miss that and would love to test the code, if you need a tester.

Oh my! Can you tell more about your approach and show some comparisons? I definitely don't want to miss that and would love to test the code, if you need a tester.

Check out my comments here, replace darkening/brightening with lowering contrast, and you get an idea of what's going on. The coding and testing is done, but the bottleneck is writing down some documentation so that people can actually use it as was intended. Having proper docs for this is crucial, as it is not a one-click solution, but rather along the lines of doing precision color grading in photoshop, but with latents instead of actual images. I'll hit you up when it's ready.

Huh! I just discovered something interesting. It's also pretty good at boosting the negative prompt, again better than emphasis (at least with simple prompts that I tested). Look at this:

Prompt is "Character art of cardinal", with a custom model.
boost
boost2