isl-org / lang-seg

Hi thanks for providing great work.
I have a question about the implementation detail of label set vectors (T). As you've pointed out in the paper, text encoder embeds the set of N potential labels into continuous vector space. However, as far as I can see, the code below seems to be that part, but it seems that only the feature of the eos token is selected after tokenizing the label set.

lang-seg/modules/models/lseg_net.py

Line 183 in 9d063b1

text_features = self.clip_pretrained.encode_text(text)

Shouldn't it extract the embedding from each label token?
Or is it being processed by a other part of the code?
Thanks

Hi @jihwanp ,

Thanks for your interest in LSeg!

The text source is processed by here. You could use a predefined label set or you could random input any label set with random length, order and content.

Hope this helps!

Hi @jihwanp ,

Thanks for your interest in LSeg!

The text source is processed by here. You could use a predefined label set or you could random input any label set with random length, order and content.

Hope this helps!

hi、 i want to know if N dim is not sure、 How can we design the Spatial regularization Structure at the back ,

Question of label set vectors