Controlnet

Question

Controlnet

YuzhiChen001 opened this issue 3 months ago · comments

nice work!
I want to know what is the difference between the controlnet you trained and the mutil-controlnet using Semantic Segmentation+Depth directly?
Any answers would be greatly appreciated!

YangXiuyu · Answer 1 · Fri May 10 2024 01:17:27 GMT+0800 (China Standard Time)

I have some questions about controlnet too. Did you train your controlnets from scratch? Or finetined on some pretraiend pixel-level semantic controlnets?

Fan Lu · Answer 2 · Fri May 10 2024 08:47:56 GMT+0800 (China Standard Time)

Hi, thanks for your interests.
Our pretrained controlnet model is specifically trained on KITTI-360 dataset with layout-based semantic/depth maps as conditional signals, which mainly covers semantics in urban scenes (e.g., car, building, road, etc).

YangXiuyu · Answer 3 · Tue May 14 2024 22:56:24 GMT+0800 (China Standard Time)

@FanLu97 Thanks for your reply. So your 2d condition images are rendered from the 3d layout right?

I'm very curious that, how you train your controlnets, did you use different prompt (generate from some existing network e.g, blip) or how you create the promps for the controlnet finetuning? And when generation, you give a prompt that defines the style? I notice that the editing results are still consistent to the layout but change the whole style. While in my case, I couldn't achieve that.

Fan Lu · Answer 4 · Wed May 15 2024 15:59:13 GMT+0800 (China Standard Time)

@OrangeSodahub
Hi, thanks for you interests.

Yes, our 2d condition images are rendered from the 3d layouts.
No, we do not use different prompts for controlnet training. Our controlnet is trained only on KITTI-360 dataset with empty prompt. When generation, we use different text prompt to change the style. The text-based control is the inherent power of pretrained Stable Diffusion. Even the controlnet is trained only on KITTI-360 dataset, it can also adapted to different styles. In the city transfer experiment, we found that the model can even generate boat rather cars when setting prompt to "in Venice", which is exciting.

YangXiuyu · Answer 5 · Wed May 15 2024 16:03:22 GMT+0800 (China Standard Time)

@FanLu97 Very helpful information. So your exact prompt is ""? I believe there must be a prompt embedding input

Fan Lu · Answer 6 · Wed May 15 2024 16:05:17 GMT+0800 (China Standard Time)

@OrangeSodahub Yes, the prompt for training ControlNet is "".

YangXiuyu · Answer 7 · Wed May 15 2024 20:22:31 GMT+0800 (China Standard Time)

So what is the value of the controlnet_conditioning_scale and guidance scale you used? guidance_scale might be 7.5

BTW, the condition input to controlnet is rgb? or its label id?

Fan Lu · Answer 8 · Thu May 16 2024 09:10:42 GMT+0800 (China Standard Time)

@OrangeSodahub

The controlnet_conditioning_scale is set to 1. However, if you use text prompt to control the style, the controlnet_conditioning_scale should be set to a smaller value, e.g., 0.5. guidance_scale is set to 7.5 as default.
We map label id to rgb. You may refer to our dataloader for more details.