Clarification on obtaining the embedding related to the <POSE> token

Question

Clarification on obtaining the embedding related to the <POSE> token

AndrejHafner opened this issue 9 months ago · comments

Hello! First of all, thank you for the great article. I have a question about how you obtain the embedding related to the token, which is then projected and used for human pose reconstruction. If I understand correctly, when the model outputs a token, you take the logits from the last layer of the LLM (on which softmax was applied and from the resulting distribution the token was sampled) and use those as embeddings?

HJ · Answer 1 · Wed Jan 24 2024 17:10:23 GMT+0800 (China Standard Time)

I think it's the last-layer embedding(hidden_states, before logits) corresponding to the <POSE> token. You can reference LISA https://github.com/dvlab-research/LISA.