microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question on zero-shot inference with ViT based model

hanguniverse opened this issue · comments

Hello, I try to use it for zero-shot detection with ground truth based on ViT model, but I couldn't find any instructions on how to use ViT, as this framework seems to only support resnet model, even on zero-shot branch, can you help me check this issue? Thank you