FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

https://groma-mllm.github.io/

Repository from Github https://github.comFoundationVision/Groma

FoundationVision/Groma Stargazers