microsoft / Oscar

Oscar and VinVL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why all rows of attention map are the same (precision=3)?

panmianzhi opened this issue · comments

I use the extracted RoI image features same as ROSITA. And I directly concat the 2048-d image features and the 6-D box features (x_min, y_min, x_max, y_max, height, width) as the input image features. Setting output_attention=True and use the pretrained Oscar model (base-vg-labels), I found that each row of the attention map(from last layer) is the same(sum attention scores of all heads), like following
image
the results is strange. Can someone explain this problem?