Please add these paper
Johnx69 opened this issue · comments
- AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
- VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
- Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
- GLaMM: Pixel Grounding Large Multimodal Model
- Planting a SEED of Vision in Large Language Model
Thanks. Done.