oxai / debias-vision-lang

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning [AACL 2022]

Home Page:https://arxiv.org/abs/2203.11933

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues running the text encoder

joniali opened this issue · comments

Thanks a lot for making your code public. It's very useful.

I found a couple of mistakes, 1) I had to change "from measuring_bias" to "from .measuring_bias" otherwise I was getting an import error. 2) I had to move debias_tokens to gpu explicitly with self.debias_tokens = self.debias_tokens.to(text.device)

Now, I am getting the following error in encode_text function:

text_features = self.clip.transformer(text_features)

RuntimeError: mat1 and mat2 must have the same dtype

I have tried to change the dtype variable (model.py line 168 to float32, float64), I have also tried to explicitly set the text_features to self.dtype before calling clip.transformer (model.py line 257) but no dice. Could you perhaps help?

@Drummersbrother @smhall97 can you have a look at this

Had the same errors.

Managed to fix it on my end by making the following changes:
Fixes:
debias\model\model.py:
Around line 170, just after: self.clip: ClipLike = clip_model
Add:
self.clip.transformer = self.clip.transformer.float()

Around line 208, just before: if self.num_prompts_tokz > 0:
Add:
self.debias_tokens = self.debias_tokens.to("cuda" if torch.cuda.is_available() else "cpu")

Around line 265 change line: @ self.clip.text_projection to
@ self.clip.text_projection.float()

Around line 276 change line: image_features = self.clip.encode_image(image)
to: image_features = self.clip.encode_image(image).float()

Seems like all the errors were caused by CLIP's weights being in float16. Also if you run the model with a single image it needs to have an extra dimension since the model expects batches of images. Hope this helps.