frank-xwang / InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Home Page:https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions about the codes relevant to instance mask.

ShunyuYao opened this issue · comments

Thank for your excellent job!

I have some questions about the code with instance masks. In the following codes, it seems that the value of visual_token_masks only depends on the self_att_ind_objs, and the self_att_all_objs is not relevant to the final value of visual_token_masks. So what is meaning of visual_token_masks = self_att_all_objs + self_att_ind_objs?

https://github.com/frank-xwang/InstanceDiffusion/blob/dadf0e3b09c2de82bf35b24e3424a14197a29906/ldm/modules/attention.py#L233C1-L240C88

# get the masks for avoiding information leakage between object patches
visual_token_masks = self_att_all_objs + self_att_ind_objs


# avoid the communications between objects and background
visual_token_masks[self_att_ind_objs < 1.0] = 0.0 # objects-background can not communicate
visual_token_masks[self_att_ind_objs >= 1.0] = 1.0 # binay mask


att_masks_[:,:,:w_h,:w_h] = visual_token_masks.view(B, 1, w_h, w_h)