rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Home Page:https://rese1f.github.io/MovieChat/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about extracted features

Hou9612 opened this issue · comments

Hello,

Greetings for this wonderful work!

The shape of the provided features are [64, 257, 1408], for these provided features, I have the following questions:

(1) What are 257 and 1408 mean? Does 257 indicate the number of tokens of each frame and 1408 indicate feature dim?
(2) Can I only use the feature representation of cls token of each frame when training the model and evaluating model performance? The size of the complete feature is about 16T, I don't have enough storage space to restore the complete feature.