Beast code in Giters

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonApache-2.0393 15 11

SEED-X

Multimodal Models in Real World

Language:Jupyter NotebookNOASSERTION353 18 20

scaling_on_scales

When do we not need larger vision models?

Language:PythonMIT291 7 14

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language:Python268 10 12

animate-your-word

Official implementations for paper: Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

Language:PythonApache-2.0254 3 3

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonNOASSERTION201 2 8

RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Language:Python173 2 15

imp

a family of highly capabale yet efficient large multimodal models

Language:PythonApache-2.0152 6 7

TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Language:Python132 8 7

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Language:Python115 3 4

MultiBooth

[arXiv 2024] MultiBooth: This repo is the official implementation of "MultiBooth: Towards Generating All Your Concepts in an Image from Text"

107 11 2

VL-InterpreT

Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers

Language:PythonMIT83 8 2

XVERSE-V-13B

Language:PythonApache-2.077 4 6

MiCo

Explore the Limits of Omni-modal Pretraining at Scale

Language:PythonApache-2.074 2 6

unified-io-2.pytorch

Language:PythonApache-2.056 6 9

Koala-video-llm

Language:PythonBSD-3-Clause27 1 5

lvlm-interpret

Language:PythonApache-2.023 1 5

diffusion-model-hallucination

Language:PythonMIT2000