bfshi / TOAST

Official code for "TOAST: Transfer Learning via Attention Steering"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Motivation for top-down input added to value matrix

Kuan-Pang opened this issue · comments

Hi - thanks for the amazing work!

In the second feedforward pass (step iv), the value matrix receives the top-down input to steer the attention map. I was wondering what is the motivation for this design decision, e.g. why does the value matrix specifically receive this signal, but not query/key matrices?

Hi, thanks for your interest in our work.

This design follows our previous paper on top-down attention (https://arxiv.org/pdf/2303.13043.pdf). The intuition is that, Q and K decide which pixels belong to the same object and should be grouped together (reflected in the attention matrix QK^T), and V decides the grouped feature of each object. We keep Q and K the same so that the grouping of each object is the same (e.g., if there's a cat and a dog in the image, two pixels on the cat belong to the same object and should be grouped together, no matter if we are looking at the cat or the dog), and we add the top-down feature on V to change/enhance the feature of the specific object we are focusing at.

Thanks for the reply!