microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do you also maintain the base / novel splits during pretraining?

mlzxy opened this issue · comments

Just out of curiosity. Thanks!

During pre-training, there are no so-called "base" or "novel" categories. Our model is pre-trained by a diverse set of object concepts that are parsed from image captions.

Thx for your clarification. What about the localizer used in pretraining? Is it a rpn trained on base images or a sliding window based proposal generator?

By default, we use RPN trained by the boxes in LVIS dataset (see Implementation details in the paper). Note that, for RPN, there are no base/novel images, since we didn't use categorical labels of these boxes. Also, random boxes can perform closely with RPN (see ablation study in paper).