FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

https://groma-mllm.github.io/

FoundationVision/Groma Issues

Clarify the bounding box format
Closed a month ago2
About pretrain checkpoint
Closed a month ago1
Batch size setting in the evaluation process
Closed a month ago4
8bit和4bit量化版本推理报错
Closed a month ago2
About grouding output
Closed a month ago1
Tested some images and felt that the grounding ability was weakened a lot compared to the original DINO？
Closed a month ago1
Could you share the prompts to instruct gpt4v to create the groma instruct ?
Closed 2 months ago1
Finetuning and dataset formatting guidelines
Closed 2 months ago2
有没有小一点的模型？ 24G现存可用的
Closed 2 months ago5
model weight problem
Closed 2 months ago7
evaluation results significantly different
Closed 2 months ago1
No groma conversation template
Closed 2 months ago1
unable to load local weight
Closed 3 months ago5
System requirements for running the model ?
Closed 3 months ago1