The meaning of num_extra_tokens = 1 + self.image_encoder.distilled
Y-NY opened this issue · comments
Yang Ningyuan commented
in class sgementer, forward function:
num_extra_tokens = 1 + self.image_encoder.distilled
x = x[:, num_extra_tokens:]
What do these two lines of code mean?How does 'distilled' define by myself?
rstrudel commented
In DeiT's case, the vision backbone have two extra tokens, one CLS token, used to perform classification, and one distillation token, used to match a teacher's output. You can see it in Figure 2 of DeiT.
The line x[:, num_extra_token:]
means we're getting all of the patch tokens, CLS and distillation tokens excluded.