Beast code in Giters

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0101200

visualwebarena

VisualWebArena is a benchmark for multimodal agents.

Language:PythonMIT15900

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonMIT87600

Kosmos-G

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Language:Python2100

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.044600

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2181600

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT3284000

open-eqa

OpenEQA Embodied Question Answering in the Era of Foundation Models

Language:Jupyter NotebookMIT16800

eai-vc

The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

Language:PythonNOASSERTION43700

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonNOASSERTION56700

paperbot

PaperBot: Learning to Design Real-World Tools Using Paper

Language:Python1000

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT363600

jihanyang

Jihan Yang's starred repositories

Qwen2

EgoSchema

IG-VLM

ALLaVA

clip-beyond-tail

VideoTree

LLoVi

DeepSeek-VL

LanguageBind

OpenGlass

VQASynth

Video-ChatGPT

visualwebarena

mmc4

Kosmos-G

Groma

llama3

nanoGPT

open-eqa

eai-vc

datacomp

paperbot

VAR

ffcv

VLN-CE

Total-Decom

GiT

3D-LR

grok-1

gemma_pytorch