beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in "how to use" code

yg-smile opened this issue · comments

Dear authors,

Thank you so much for your work. I tried to run the sample code in the Usage/how to use section, but encountered the error:

Traceback (most recent call last):
...
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: CLIP.forward() missing 2 required positional arguments: 'text_short' and 'rank'

I did not change the sample code except replacing the image file with my own test image..

from long_clip.model import longclip
import torch
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = longclip.load("./checkpoints/longclip-L.pt", device=device)

text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.",
                          "A man is driving a car in an urban scene."]).to(device)
image = preprocess(Image.open("./doc/tmp_data/page=16_imgid=0.png")).unsqueeze(0).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

Could you please help me? Thank you!

This is a duplicate of another closed issue. Closing for now.. Thank you.