Patch features to region feature
kimihailv opened this issue · comments
Hello. Could you please explain how patch features from ViT are aggregated to one specific region feature? This point is confusing, because a region doesn't necessarily contain one or several whole patches.
Hi,
sorry for my late reply.
I use image_atts to indicate a region:
https://github.com/zengyan-97/X-VLM/blob/master/models/model_pretrain.py#L14