dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Congratulations for the best LLaVA derived models !

deepbeepmeep opened this issue · comments

I have been playing with most multimodal models based on LLaVA models and I can tell that mini Gemini (the 13B version) is one of the best if not the best for its size.

Keep on the good work and hopefully you can go even further using Llama 3 or Phi-3 as a base model.

Hi, thanks for your response and suggestions! We released the LLaMA3-based models. You are welcome to try the MGM-8B and MGM-8B-HD.