rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to obtain cls_emb ?

haozhi1817 opened this issue · comments

cls_emb is a learnable parameter, and we compute Q, K, V by nn.Linear inTransFormerBlock, which means a part of Q(or K or V)are derived from two learnable parameters cls_emb(a learnable parameter) and W (a learnable parameter in nn.Linear), I think it is not easy to obtain a reasonable cls_emb and a reasonable nn.Linear.weight through gradient descent at the same time.

The use of a CLS token is very standard. It's a way to aggregate and pool global information into a single token. You can check BERT and the vision transformer paper for instance :)

The use of a CLS token is very standard. It's a way to aggregate and pool global information into a single token. You can check BERT and the vision transformer paper for instance :)

Thank you for your response. When I was reading the code of DETR and MaskFormer, I noticed that they set tgt (which is the same as cls_emb) as the query. I thought that Q was equal to tgt, rather than Q being equal to a learnable weight multiplied by tgt. However, I now realize that I was mistaken. Once again, thank you.