DirtyHarryLYL / LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

Please add these paper

Johnx69 opened this issue 25 days ago · comments

Anh Dao commented 25 days ago

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
GLaMM: Pixel Grounding Large Multimodal Model
Planting a SEED of Vision in Large Language Model

Yong-Lu Li commented 25 days ago

Thanks. Done.