About add the quantized image tokens to pretrained language tokenizer.

Question

About add the quantized image tokens to pretrained language tokenizer.

Jiushanhuadao opened this issue 5 months ago · comments

I checked the predict code and paper. It seems you added the quantized image tokens to pretrained language tokenizer. In other papers, Some people separate the tokenizer of language and images, and the image feature are concatenated with the embedding of language through a linear layer. Have you tried this method?

Yuying Ge · Answer 1 · Mon Feb 26 2024 11:54:28 GMT+0800 (China Standard Time)

We added the quantized image tokens to pretrained language tokenizer to unify the representation of image and text tokens, and the LLM is trained to optimize the visual embeddings. We did not try the latter method.