frank-xwang / InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Home Page:https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Other ways to training the model

66ling66 opened this issue · comments

屏幕截图 2024-03-18 171430
Could you provide a new instruct? or some example of the training data

Can you provide the full command you used for running the experiments and the error message? Thanks.

Thanks you,my gpu memory is not encough,and I just change my training strategy.

grounding_input = self.grounding_tokenizer_input.prepare(batch, return_att_masks=self.config.use_masked_att)

if I don't want to add this when training the unet model,what should I do?Is there a very simple way to do this?

Sorry for the late reply. I am a little bit confused, you mean you want to train the model without any grounding inputs? Our model training needs to have instance/part-level location and captioning inputs, otherwise, it should be equivalent to directly fine-tuning the Stable Diffusion model.

Yes I want to train the model without any grounding inputs,I just train a lora model ,when sampling image,add the grounding inputs,it seems work a litle bit.

Oh, I see. Maybe you can provide zero tensors as a placeholder for bbox, masks and instance caption embeddings. The easies way might be calling 'self.grounding_tokenizer_input.get_null_input()'. You should manually set 'self.set=True', and provide the 'self.device', 'self.dtype', 'self.max_box', etc.

You can find more details on this function at 'grounding_input/text_grounding_tokinzer_input.py'