FlagAI-Open / FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question]: Altclip inference, Why use inner product instead of cosine similarity?

MobiusDai opened this issue · comments

Description

https://github.com/FlagAI-Open/FlagAI/blob/master/examples/AltCLIP/README.md

    image = Image.open("./dog.jpeg")
    image = transform(image)
    image = torch.tensor(image["pixel_values"]).to(device)
    tokenizer_out = tokenizer(["a rat", "a dog", "a cat"], 
                                padding=True,
                                truncation=True,
                                max_length=77,
                                return_tensors='pt')
    text = tokenizer_out["input_ids"].to(device)
    attention_mask = tokenizer_out["attention_mask"].to(device)
    with torch.no_grad():
        image_features = model.get_image_features(image)
        text_features = model.get_text_features(text, attention_mask=attention_mask)
        text_probs = (image_features @ text_features.T).softmax(dim=-1)

这里的text_probs是image_features和text_features直接做内积然后softmax?text_probs不应该是 image_features 与 text_features先做余弦相似度再softmax吗?

Alternatives

No response

这块应该都是可以的,余弦也可以,您可以参考下open_clip的官方代码:https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/loss.py#L102
这个地方也是用的内积。

嗯嗯嗯,是的,加上normalize会好一些。不过之前没加的话,推理也是正常的,不会有明显的推理错误的情况哈。

嗯嗯嗯,是的,加上normalize会好一些。不过之前没加的话,推理也是正常的,不会有明显的推理错误的情况哈。

感谢耐心回复