cloneofsimo / paint-with-words-sd

Implementation of Paint-with-words with Stable Diffusion : method from eDiff-I that let you generate image from text-labeled segmentation map.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confliction of A1111 extension of PwW+Control to the original extension of ControlNet

lwchen6309 opened this issue · comments

The current implementation conflict with the original extension of ControlNet, as reported by @mykeehu, here.

This is caused by the same argument defined in preload.py in the extension.

def preload(parser):
    parser.add_argument("--controlnet-dir", type=str, help="Path to directory with ControlNet models", default=None)
    parser.add_argument("--no-half-controlnet", action='store_true', help="do not switch the ControlNet models to 16-bit floats (only needed without --no-half)", default=None)

I'm trying to fix this bug, and open this issue for further discussion if there is any.

Just to update the progress.
Currently this extension is not compatible with the original extension of ControlNet dues to the conflict of arguments

Although this error can be fixed by changing the script to

def preload(parser):
    if not any(arg.dest == 'controlnet_dir' for arg in parser._actions):
        parser.add_argument("--controlnet-dir", type=str, help="Path to directory with ControlNet models", default=None)
    if not any(arg.dest == 'no_half_controlnet' for arg in parser._actions):
        parser.add_argument("--no-half-controlnet", action='store_true', help="do not switch the ControlNet models to 16-bit floats (only needed without --no-half)", default=None)

while it takes an unreasonably long time to load UI instead.

What is the reason why it loads so slowly when I apply this fix to preload.py? Is it reloading things multiple times?
Mod: Actually it won't load for me at all leaving it running for multiple minutes, it fills up my 16GB RAM completely.

I had update the preload.py to avoid the confliction of arguments.

On the other hand, it turns out the program is stuck at updating ui at the line 845 of scripts/controlnet.py
when both repo are loaded.
Commenting this line solve the problem while making UI much more complicated.
The loaded UI would look something like this
screencapture-1cc398fb-f325-421d-gradio-live-2023-03-19-21_57_13

Please see the explanation here

Just a hint: if there is a sd-webui-controlnet folder inside the extensions folder, you don't fill in the 845 line. It would only be better because if I manually comment it, git pull update is not possible.

After disabling line 845 and Controlnet, the UI comes up, but eventually it doesn't load, it's completely broken:
image

It may also be conflict with one of the other extensions. I have these installed:
image

It seems to be another conflict since commenting line 845 works for my case, which I test it on Windows, Linux, and MACOS.

Just to confirm, do these extensions compatible with Mikubill/sd-webui-controlnet?
I would expect the same conflict of sd-webui-controlnet to other extension you installed since the extension of ControlNet+PwW simply adapt it by adding PwW.

Yes, Controlnet works with these and they do not cause problems with each other.

Hi @mykeehu , thanks a lot for your information.

It's kinda weird to me since the two repo basically is doing the same thing.
I'm not able to check which extension ControlNet+PwW exactly conflict with dues to a large amount of extension in your case.

However, I guess this bug might come from duplicated UI, which is created when I inherit the UI of controlnet to create the UI for ControlNet+PwW.
To validate this, I create a simplified code of ControlNet+PwW that does not create a new UI for PwW but directly add it to the original controlnet UI in scripts/control.py. Also, I make it compatible with Mikubill/sd-webui-controlnet
Please try this version

git clone -b merge_control git@github.com:lwchen6309/sd-webui-controlnet-pww.git

and let me know if it works or not.

If it still doesn't work, please disable Mikubill/sd-webui-controlnet and uncomment line 845, which now becomes line 965 of controlnet.py
script_callbacks.on_after_component(img2img_tab_tracker.on_after_component_callback)

Thank you so much for your report :)

Very interesting...
I moved all the extensions to another folder, leaving yours alone. I got this on load, which I didn't get before:

H:\Stable-Diffusion-Automatic\stable-diffusion-webui\venv\lib\site-packages\gradio\deprecation.py:43: UserWarning: You have unused kwarg parameters in Image, please remove them: {'image': array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)}
  warnings.warn(
H:\Stable-Diffusion-Automatic\stable-diffusion-webui\venv\lib\site-packages\gradio\deprecation.py:43: UserWarning: You have unused kwarg parameters in Tab, please remove them: {'open': False}
  warnings.warn(

and the UI still does not load...

Now I deleted the folder and reinstalled it with this command in extension folder, but no helped:
git clone https://github.com/lwchen6309/sd-webui-controlnet-pww

Hmm... it's indeed interesting but also makes things clearer.
At least we don't have to deal with other extensions at a moment.
The warning of gradio is not clear to me but we can still start by gradually disabling the PwW function.

(a) Let's try a bit simpler version,

git clone -b dummy git@github.com:lwchen6309/sd-webui-controlnet-pww.git

where I add a dummy flag to disable all the PwW modules and UI.

https://github.com/lwchen6309/sd-webui-controlnet-pww/blob/647febe0f89ba5e7d58a8f9a11f91814d8423d2e/scripts/controlnet.py#L27

The PwW is disable when the flag is set to True. In this case, the function of this repo should be exactly the same as Mikubill/sd-webui-controlnet.

(b) If that does not works for you, then try this version

git clone -b totally_dummy git@github.com:lwchen6309/sd-webui-controlnet-pww.git

where I comment all the code associated with PwW and exactly match the code of Mikubill/sd-webui-controlnet; it should work for you as you mentioned previously.

Note that the Mikubill/sd-webui-controlnet I fork is the latest version (commit dbdd6b1), please also check if this veriosn works for you. Also, please check if you got the latest A1111 webui installed and let me know the gradio version you got.

I'd assume at least (b) should work as it is exactly the code of Mikubill/sd-webui-controlnet.
If (b) works while (a) does not work, then the only three candidates to bring bug is scripts/pww_utils.py, scripts/hook_pww.py, and scripts/controlney.py.
Validation of the first two is simply commenting all the code of these files, respectively, and see if UI works or not.
If none of these works, then the problem comes from scripts/controlney.py, which is likely to be the PwW ui from the warning message you showed. If that happens, we probably have to split the PwW ui later.

Lastly, thank you so much for your patience in figuring out the bug with me :)

The UI worked fine on the first version, but I got an interesting error message on the console after loaded embeddings:

Model loaded in 10.9s (load weights from disk: 2.2s, create model: 1.3s, apply weights to model: 0.9s, apply half(): 3.2s, load VAE: 0.5s, move model to device: 1.1s, load textual inversion embeddings: 1.6s).
H:\Stable-Diffusion-Automatic\stable-diffusion-webui\venv\lib\site-packages\gradio\deprecation.py:43: UserWarning: You have unused kwarg parameters in Tab, please remove them: {'open': False}
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 47.7s (import gradio: 3.4s, import ldm: 4.1s, other imports: 2.4s, list extensions: 0.2s, list SD models: 0.2s, setup codeformer: 0.1s, load scripts: 3.1s, load SD checkpoint: 11.1s, create ui: 22.5s, gradio launch: 0.6s).

I'll try version b in a moment, but this is how I retrieve it from git, because I get an error on your command:

H:\Stable-Diffusion-Automatic\stable-diffusion-webui\extensions>git clone -b dummy git@github.com:lwchen6309/sd-webui-controlnet-pww.git
Cloning into 'sd-webui-controlnet-pww'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

H:\Stable-Diffusion-Automatic\stable-diffusion-webui\extensions>git clone -b dummy https://github.com/lwchen6309/sd-webui-controlnet-pww.git

The second version is also tested, the UI works there too. No other extensions were installed when I tested it.
Without the debug command, the previously described problem remains.

It's great at least the dummy version works, except for the error message on the console after loaded embeddings (we can come back to this issue after making PwW UI work for you.

The code of PwW in "scripts/controlney.py" is basically doing 3 things, which will be checked gradually in the rest of debugging steps:
(1) Import the PwW core function from scripts/pww_utils.py and the hook that hijacks diffusion UNet in scripts/hook_pww.py.
(2) Create UI
(3) Apply the hook and Inject CrossAttention forward function

Steps:
(a) Please try this version that enable (1) importing functions from scripts/pww_utils.py and in scripts/hook_pww.py

git clone -b dummy_importFunction https://github.com/lwchen6309/sd-webui-controlnet-pww.git

(b) If (a) works, please try this version that enable (1) and (2)

git clone -b dummy_createUI https://github.com/lwchen6309/sd-webui-controlnet-pww.git

I'd expect some error at this stage from the gradio warning message you showed a moment ago.
Also, from the gradio error message, the problem might come from line 419 of scripts/controlnet.py

def create_canvas(h, w):
    return np.zeros(shape=(h, w, 3), dtype=np.uint8) + 255

If you encounter the same gradio warning, try this version

git clone -b dummy_createUI_pil https://github.com/lwchen6309/sd-webui-controlnet-pww.git

that I disable the some of UI initialization utilizing "create_canvas".

Thanks you so much for your effort and information :)

Here are the final results:

  • the first version worked fine
  • the second version caused the usual error in the console
  • the third version did not give any error in the console, but the UI stopped loading at the Loading prompt, as before.

Thanks a lot for the results.

(a) Please try this version that I further comment the UI of "Color context option" that might leads to problem.

git clone -b dummy_createUI_pil https://github.com/lwchen6309/sd-webui-controlnet-pww.git

The rest of UI is simple gradio component, as usual as in the Controlnet UI, and should be safe.

(b) If this version work, please set the dummy flag to false to enable PwW function. This would not change UI but activate (3) Apply the hook and Inject CrossAttention forward function.
You can then use PwW following the instruction of repo, hopefully.

I hope we are reaching the end of debugging, thanks again for your patience and help :)

Good news! Everything is now loaded, including Controlnet and PWW before deactivated dummy!
The canvas is very slow to colour, it crashed on the first try. It is impossible to draw.
image

It still works after turning off the dummy, so that's fine. :) Apart from the slowness of drawing, Extract and Generate do nothing for me.
I loaded a picture, the two buttons still don't work. What do you mean? What should I do?
"Apply the hook and Inject CrossAttention forward function."
Is this the one in Settings? "Allow other script to control this extension"

Great! thank god. We're finally here. It seems the problem comes from the gradio version.

Let me first summarize your results and explain the conclusion later:

  • The response of blank images created is slow. I sometimes encounter the slow response of blank canvas, and currently, I have no idea how it happened since I used nearly the same function as the controlnet's to create blank canvas. I'll raise another issue to solve this.

  • "Apply the hook and Inject CrossAttention forward function." This is enabled by setting dummy flag to false. So it work already for your case.

if not dummy_mode:
    # Retreive PwW arguments
    pww_enabled, color_context, weight_function_scale, color_map_image = pww_args
    if pww_enabled:
        color_map_image = Image.fromarray(color_map_image).resize((p.width, p.height))
        color_context = ast.literal_eval(color_context)
        pww_cross_attention_weight = encode_text_color_inputs(p, color_map_image, color_context)
        pww_cross_attention_weight.update({
            "WEIGHT_FUNCTION": lambda w, sigma, qk: float(weight_function_scale) * w * math.log(sigma + 1.0) * qk.max()})
        self.latest_network.p = p
        self.latest_network.pww_cross_attention_weight.update(pww_cross_attention_weight)
        hijack_CrossAttn(p)
  • The extract and Generate button is indeed disabled now since they associate with the UI I commented, see the following part that I commented in the last version.
with gr.Accordion('Color context option', open=False):
prompts = []
strengths = []
color_maps = []
colors = [gr.Textbox(value="", visible=False) for i in range(MAX_NUM_COLORS)]
for n in range(MAX_NUM_COLORS):
with gr.Row():
    # color_maps.append(gr.Image(image=create_canvas(15,3), interactive=False, type='numpy'))
    color_maps.append(gr.Image(interactive=False, type='numpy'))
    with gr.Column():
    prompts.append(gr.Textbox(label="Prompt", interactive=True))
    with gr.Column():
    strengths.append(gr.Textbox(label="Strength", interactive=True))

extract_color_boxes_button.click(fn=extract_color_textboxes, inputs=[segmentation_input_image], outputs=[*color_maps, *prompts, *strengths, *colors])
generate_color_boxes_button.click(fn=collect_color_content, inputs=[*colors, *prompts, *strengths], outputs=[color_context])

This part corresponds to the UI here
image

As you can see from the script above, the callbacks of the two buttons are binded to the input retrieved from the "Color context option". Also, from the script, "color_maps", "prompts", and "strengths" are lists as the inputs of callbacks. This is the problem since some of the gradio versions might not support this.

Please check your gradio version of A1111 webui by

path/stable-diffusion-webui/venv/bin/pip list

to see if It matches gradio==3.16.2, which is the gradio version I worked with.

So now, for your case, nearly all of PwW works except for the "Color context option", which is only associated to the gradio version. I'd assume that your version is different from this. If so, please try to install gradio==3.16.2 and we are good to challenge this branch

git clone -b merge_control https://github.com/lwchen6309/sd-webui-controlnet-pww.git

and the main branch

git clone https://github.com/lwchen6309/sd-webui-controlnet-pww.git

Hope it works and you can enjoy PwW :)

My gradio is 3.16.2, this is written in the footer of Automatic:
image

The git clone again comes down with the wrong version, so I'm no further ahead. After I put the previous version back and commented out two lines, most of it works:
image

This is what the code looks like now:

with gr.Accordion('Color context option', open=False):
            prompts = []
            strengths = []
            color_maps = []
            colors = [gr.Textbox(value="", visible=False) for i in range(MAX_NUM_COLORS)]
            for n in range(MAX_NUM_COLORS):
                with gr.Row():
                    # color_maps.append(gr.Image(image=create_canvas(15,3), interactive=False, type='numpy'))
                    color_maps.append(gr.Image(interactive=False, type='numpy'))
                    with gr.Column():
                        prompts.append(gr.Textbox(label="Prompt", interactive=True))
                    # with gr.Column():
                    #     strengths.append(gr.Textbox(label="Strength", interactive=True))

        extract_color_boxes_button.click(fn=extract_color_textboxes, inputs=[segmentation_input_image], outputs=[*color_maps, *prompts, *strengths, *colors])
        generate_color_boxes_button.click(fn=collect_color_content, inputs=[*colors, *prompts, *strengths], outputs=[color_context])
        ctrls = (pww_enabled, color_context, weight_function, segmentation_input_image)
        return ctrls

Hmm... not sure what causes the error here.
If I understand correctly, commenting

            with gr.Column():
                strengths.append(gr.Textbox(label="Strength", interactive=True))

make it work for you.

If this is the case, let's try this,

with gr.Accordion('Color context option', open=False):
    prompts = []
    strengths = []
    color_maps = []
    colors = [gr.Textbox(value="", visible=False) for i in range(MAX_NUM_COLORS)]
    for n in range(MAX_NUM_COLORS):
        with gr.Row():
            # color_maps.append(gr.Image(image=create_canvas(15,3), interactive=False, type='numpy'))
            color_maps.append(gr.Image(interactive=False, type='numpy'))
            with gr.Column():
                prompts.append(gr.Textbox(label="Prompt", interactive=True))
                strengths.append(gr.Textbox(label="Strength", interactive=True))
            # with gr.Column():
            #     strengths.append(gr.Textbox(label="Strength", interactive=True))

which simply remove 'with gr.Column()' for "strengths". It might be also worthwhile to check other arrangements of UI here.

Hold on tight! The problem was caused by Label. I deleted it from this line
strengths.append(gr.Textbox(label="Strength", interactive=True))
the letter "h" and started it that way:
strengths.append(gr.Textbox(label="Strengt", interactive=True))

And it works! But with "h" it dies. I'm testing it a bit now.

I've renamed it that, and that's how it works:
strengths.append(gr.Textbox(label="Prompt Strength", interactive=True))

The functions work, although the composition is unfortunately not created:
image

Although the composition is still not the same, the ETA Noise multiplier has an effect. In the previous image it was set at 0.7, as I use it most of the time, in the image below I changed it to 1, so all three figures are now in the picture, but the colours are not affected.
image

Good to know it finally work!!

The parameter affect PwW is as follows (maybe I should add it to REAMDE):

  • weight function scale: 0.1~0.4
  • eta: 0.5~1.0
  • (prompt) strength: 0.3~2.0

I typically set eta=1.0 for inpainting and eta=0.5 for txt-img

And one more good news for today: I put the other extensions back and they work
image

And you can fix the double Controlnet effect by a folder checking and checking in config (if it is not disabled) and use the original. This is just a hint, I was thinking about how to make PWW more flexible.

I'm still playing with the settings, thanks for the tips! So far it doesn't want to cache the content in the right place, but once I find out what the reason might be, I'll write it. Thanks for all the help so far!

One more usage tip: when you press the Extract color content button, you should open automatically the Color context option block, because it is closed by default, and so it may not be clear that you need to adjust it there.

Based on my first tests:

  • ETA should be 1 under txt2img too, otherwise it really generates the composition randomly
  • the Weight function scale didn't bother me much, I didn't notice any changes up to 0.1-0.4.
  • I set the Prompt strength to 2 for objects for a more accurate composition, but even here I sometimes swap the objects (man, woman, dog)
  • in the Prompt I have to type the complete sentence (for example: a closeup photo of a brown haired man with big dog and blonde woman walking in the forest), and in the color prompt only the objects (man, dog, woman), I don't type properties here, it takes them from the main prompt.

Suddenly that's all I experienced.

Thanks for the great suggestion!

I had updated the main branch to rename "strength" to "prompt strength". Please have a try.

git@github.com:lwchen6309/sd-webui-controlnet-pww.git
https://github.com/lwchen6309/sd-webui-controlnet-pww.git

not sure which one works for you.
I'll first make sure the main branch works for you and then update the "Extract color content" callback function as in your suggestion. Please note that it is still incompatible with Mikubill/sd-webui-controlnet at a moment.

btw, can you please share your branch of PwW repo to me?

I reinstalled it from the master link and it works, so that's a tick, it's fine. I disabled Controlnet before, so no problem.
Thanks for the quick help, and I hope others don't have problems anymore.
I'll be curious to see how the color weights are resolved.
Your extension has forked and shared for you.

Otherwise, your solution is similar to the Latent Couple extension, only it generates without colors, for fixed areas. What's good about your extension is that it would be able to generate freely, anywhere, into any shape, plus you don't need to use AND to generate, because you have the prompt for the colors.
Here is the Latent Couple extension page, you might want to also take the end step option for better merging:
https://github.com/opparco/stable-diffusion-webui-two-shot

The Latent Couple extension also has a fork which includes the use of a color mask https://github.com/ashen-sensored/stable-diffusion-webui-two-shot check readme.

@nistvan86 Thank you, that's exactly what I thought, and it turns out it exists. I will try it!

The concept and idea is perfect, but unfortunately it doesn't work for him :(
image

@mykeehu I've experimented with the add-on and it's a hit and miss sometimes. It seems to work if you apply it for a few things, but it very quickly falls apart when the layout is too complex.
But this does seems to be the case with paint-with-words as well.
I would love to know what is causing this. Maybe the control resolution of these solutions are too low or it's very easy to make the full prompt unbalanced.

I believe it's better to incorporate the mask associated prompts into the generic prompt as well, and also describe the whole scene with extra things. (I didn't do this in the linked example though). So eg. in your case, maybe you should mention in the generic prompt that there's a dog between them.

A Latent Couple works very well by default, I took a pretty complex picture with it, i.e. a man, two women, cuddling, with Controlnet, Lora and TI.
The basics are there for the colours as well, as I looked, but it blends too soon, so there is no option to start blending at which step (end step option)

The Latent Couple is amazing, for both its implementation and results!

See the benchmark on one of the examples with prompt

A turtle with a shell made of cake and chocolate material, its texture and grain need to be particularly emphasized.

ControlNet + PwW
image

ControlNet + LatentCouple
image

The results are ok, and the rest are just a matter of cherry-picking and parameter tuning (e.g. alpha blend, weight).

So far, I don't see the advantages of PwW compared to this latent couple implementation. The additional "AND" doesn't seem to bother users.
The way they control the composition is a bit different
For each denoising step

  • LatentCouple (LC) simply apply a text mask before denoising UNet.
  • PwW apply text mask to each CrossAttentionModule in UNet.

From this sense, PwW would have better control than LC while I don't see much improvement in this case.