What is the meaning of 'flat' variable in vit_rollout.py?
yojayc opened this issue · comments
The code doesn't use the variable after reevaluating it in line 27
Line 27 in 15a81d3
@yojayc
This is for discarding the lowest attention weights, flat is generated as a view into the attention_heads_fused
, therefore modifying flat in line 27, results in modifying attention_heads_fused
, you can learn more about views here.
Not also as the indices that are equal to zero are filtered out, this is done because the attention weights accounting for the CLS token are kept by default.
I hope this helps.