About add the quantized image tokens to pretrained language tokenizer.
Jiushanhuadao opened this issue · comments
Ray Yarman commented
I checked the predict code and paper. It seems you added the quantized image tokens to pretrained language tokenizer. In other papers, Some people separate the tokenizer of language and images, and the image feature are concatenated with the embedding of language through a linear layer. Have you tried this method?
Yuying Ge commented
We added the quantized image tokens to pretrained language tokenizer to unify the representation of image and text tokens, and the LLM is trained to optimize the visual embeddings. We did not try the latter method.