Issues running the text encoder
joniali opened this issue · comments
Thanks a lot for making your code public. It's very useful.
I found a couple of mistakes, 1) I had to change "from measuring_bias" to "from .measuring_bias" otherwise I was getting an import error. 2) I had to move debias_tokens to gpu explicitly with self.debias_tokens = self.debias_tokens.to(text.device)
Now, I am getting the following error in encode_text function:
text_features = self.clip.transformer(text_features)
RuntimeError: mat1 and mat2 must have the same dtype
I have tried to change the dtype variable (model.py line 168 to float32, float64), I have also tried to explicitly set the text_features to self.dtype before calling clip.transformer (model.py line 257) but no dice. Could you perhaps help?
@Drummersbrother @smhall97 can you have a look at this
Had the same errors.
Managed to fix it on my end by making the following changes:
Fixes:
debias\model\model.py:
Around line 170, just after: self.clip: ClipLike = clip_model
Add:
self.clip.transformer = self.clip.transformer.float()
Around line 208, just before: if self.num_prompts_tokz > 0:
Add:
self.debias_tokens = self.debias_tokens.to("cuda" if torch.cuda.is_available() else "cpu")
Around line 265 change line: @ self.clip.text_projection
to
@ self.clip.text_projection.float()
Around line 276 change line: image_features = self.clip.encode_image(image)
to: image_features = self.clip.encode_image(image).float()
Seems like all the errors were caused by CLIP's weights being in float16. Also if you run the model with a single image it needs to have an extra dimension since the model expects batches of images. Hope this helps.