zengyan-97 / X2-VLM

All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Patch features to region feature

kimihailv opened this issue · comments

Hello. Could you please explain how patch features from ViT are aggregated to one specific region feature? This point is confusing, because a region doesn't necessarily contain one or several whole patches.

Hi,
sorry for my late reply.
I use image_atts to indicate a region:
https://github.com/zengyan-97/X-VLM/blob/master/models/model_pretrain.py#L14