[Question]: Altclip inference, Why use inner product instead of cosine similarity?

Question

[Question]: Altclip inference, Why use inner product instead of cosine similarity?

MobiusDai opened this issue a year ago · comments

Description

https://github.com/FlagAI-Open/FlagAI/blob/master/examples/AltCLIP/README.md

    image = Image.open("./dog.jpeg")
    image = transform(image)
    image = torch.tensor(image["pixel_values"]).to(device)
    tokenizer_out = tokenizer(["a rat", "a dog", "a cat"], 
                                padding=True,
                                truncation=True,
                                max_length=77,
                                return_tensors='pt')
    text = tokenizer_out["input_ids"].to(device)
    attention_mask = tokenizer_out["attention_mask"].to(device)
    with torch.no_grad():
        image_features = model.get_image_features(image)
        text_features = model.get_text_features(text, attention_mask=attention_mask)
        text_probs = (image_features @ text_features.T).softmax(dim=-1)

这里的text_probs是image_features和text_features直接做内积然后softmax？text_probs不应该是 image_features 与 text_features先做余弦相似度再softmax吗？

Alternatives

No response

zhaohu xing · Answer 1 · Thu Jul 27 2023 12:40:33 GMT+0800 (China Standard Time)

这块应该都是可以的，余弦也可以，您可以参考下open_clip的官方代码：https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/loss.py#L102
这个地方也是用的内积。

jerryjiang · Answer 2 · Thu Jul 27 2023 14:17:44 GMT+0800 (China Standard Time)

这块应该都是可以的，余弦也可以，您可以参考下open_clip的官方代码：https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/loss.py#L102 这个地方也是用的内积。

open_clip中在encode image和text的时候做了normalize
https://github.com/mlfoundations/open_clip/blob/24ddefb37fc4892f6a0c975b732226fe8a9a8613/src/open_clip/model.py#L226
https://github.com/mlfoundations/open_clip/blob/24ddefb37fc4892f6a0c975b732226fe8a9a8613/src/open_clip/model.py#L240

zhaohu xing · Answer 3 · Thu Jul 27 2023 15:21:54 GMT+0800 (China Standard Time)

嗯嗯嗯，是的，加上normalize会好一些。不过之前没加的话，推理也是正常的，不会有明显的推理错误的情况哈。

jerryjiang · Answer 4 · Thu Jul 27 2023 18:44:28 GMT+0800 (China Standard Time)

嗯嗯嗯，是的，加上normalize会好一些。不过之前没加的话，推理也是正常的，不会有明显的推理错误的情况哈。

感谢耐心回复