How to obtain cls_emb ?
haozhi1817 opened this issue · comments
cls_emb is a learnable parameter, and we compute Q, K, V by nn.Linear inTransFormerBlock, which means a part of Q(or K or V)are derived from two learnable parameters cls_emb(a learnable parameter) and W (a learnable parameter in nn.Linear), I think it is not easy to obtain a reasonable cls_emb and a reasonable nn.Linear.weight through gradient descent at the same time.
The use of a CLS token is very standard. It's a way to aggregate and pool global information into a single token. You can check BERT and the vision transformer paper for instance :)
The use of a CLS token is very standard. It's a way to aggregate and pool global information into a single token. You can check BERT and the vision transformer paper for instance :)
Thank you for your response. When I was reading the code of DETR and MaskFormer, I noticed that they set tgt (which is the same as cls_emb) as the query. I thought that Q was equal to tgt, rather than Q being equal to a learnable weight multiplied by tgt. However, I now realize that I was mistaken. Once again, thank you.