kyegomez / CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Home Page:https://discord.gg/qUtxnK2NMf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is CM3Leon not gonna be open source?

BoghosDavtyan opened this issue · comments

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Hey @BoghosDavtyan Nope it isn't. Unfortunately it's up to us to implement it

commented

@kyegomez What do you mean by "implement it". How are we supposed to implement something that we don't have access too?

@kyegomez What do you mean by "implement it". How are we supposed to implement something that we don't have access too?

Based on the research paper which we do have access to.

yeah of course, there is quite a detail that we can use to implement it, this is great.

but why are you using ViT to encode the images, from the original CM3leon paper, they used VQGAN to tokenize the images. and i didn't see where you are using the tokenizer you build in the in the cm3 directory.

i will like to contribute, but there are some things that i didn't understand actually.
it seems the train.py is not tested.

@Al-aminI Your right they do used VQGAN as an encoder, but I just used a vit for simplicity. I'm very open minded to refactoring this to use VQGAN if you can help me