UrbanArchitect / UrbanArchitect

The official repository of our paper: "Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Controlnet

YuzhiChen001 opened this issue · comments

nice work!
I want to know what is the difference between the controlnet you trained and the mutil-controlnet using Semantic Segmentation+Depth directly?
Any answers would be greatly appreciated!

I have some questions about controlnet too. Did you train your controlnets from scratch? Or finetined on some pretraiend pixel-level semantic controlnets?

Hi, thanks for your interests.
Our pretrained controlnet model is specifically trained on KITTI-360 dataset with layout-based semantic/depth maps as conditional signals, which mainly covers semantics in urban scenes (e.g., car, building, road, etc).

@FanLu97 Thanks for your reply. So your 2d condition images are rendered from the 3d layout right?

I'm very curious that, how you train your controlnets, did you use different prompt (generate from some existing network e.g, blip) or how you create the promps for the controlnet finetuning? And when generation, you give a prompt that defines the style? I notice that the editing results are still consistent to the layout but change the whole style. While in my case, I couldn't achieve that.

image

@OrangeSodahub
Hi, thanks for you interests.

  1. Yes, our 2d condition images are rendered from the 3d layouts.
  2. No, we do not use different prompts for controlnet training. Our controlnet is trained only on KITTI-360 dataset with empty prompt. When generation, we use different text prompt to change the style. The text-based control is the inherent power of pretrained Stable Diffusion. Even the controlnet is trained only on KITTI-360 dataset, it can also adapted to different styles. In the city transfer experiment, we found that the model can even generate boat rather cars when setting prompt to "in Venice", which is exciting.

@FanLu97 Very helpful information. So your exact prompt is ""? I believe there must be a prompt embedding input

@OrangeSodahub Yes, the prompt for training ControlNet is "".

So what is the value of the controlnet_conditioning_scale and guidance scale you used? guidance_scale might be 7.5

BTW, the condition input to controlnet is rgb? or its label id?

@OrangeSodahub

  1. The controlnet_conditioning_scale is set to 1. However, if you use text prompt to control the style, the controlnet_conditioning_scale should be set to a smaller value, e.g., 0.5. guidance_scale is set to 7.5 as default.
  2. We map label id to rgb. You may refer to our dataloader for more details.