Code for Tokenization?
s4lome opened this issue · comments
Thank you for sharing this most exciting work!
I would like to know: Is the code for tokenizing different modalities not released yet or am I failing to read where in the code the tokenization happens?
I would like to use Meta Transformer on a custom Data Set, with image and text inputs.
As far as I understood the workflow would be:
token_text, token_image = tokenize(text), tokenize(image)
embedding_text = pretrained_encoder(token_text) # as described in demo
embedding_image = pretrained_encoder(token_image) # as described in demo
downstream_task(embedding_text, embedding_image)
Is this correct on a very high level?
Thanks in advance!
Thank you for your interest in Meta-Transformer. The tokenization part will be released in 1-2 days, and I've worked on this for about 10 days, which I hope could be easy to use. On the custom dataset, your pseudo code is accurate.
If you have additional questions, please feel free to let me know, and I'm willing to offer my help~