Yuezeyi

Yuezeyi

Geek Repo

0

followers

0

following

Github PK Tool:Github PK Tool

Yuezeyi's starred repositories

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonLicense:MITStargazers:5041Issues:48Issues:458

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:2114Issues:30Issues:87
Language:PythonLicense:Apache-2.0Stargazers:1888Issues:28Issues:140

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1348Issues:25Issues:63

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonLicense:NOASSERTIONStargazers:1232Issues:3Issues:113

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:PythonStargazers:599Issues:0Issues:0

Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonLicense:Apache-2.0Stargazers:508Issues:35Issues:20

tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

Language:PythonLicense:Apache-2.0Stargazers:449Issues:10Issues:93

honeybee

Official implementation of project Honeybee (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:405Issues:15Issues:21

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonLicense:Apache-2.0Stargazers:393Issues:15Issues:11

SEED-X

Multimodal Models in Real World

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:353Issues:18Issues:20

scaling_on_scales

When do we not need larger vision models?

Language:PythonLicense:MITStargazers:291Issues:7Issues:14

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

animate-your-word

Official implementations for paper: Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

Language:PythonLicense:Apache-2.0Stargazers:254Issues:3Issues:3

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonLicense:NOASSERTIONStargazers:201Issues:2Issues:8

RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

imp

a family of highly capabale yet efficient large multimodal models

Language:PythonLicense:Apache-2.0Stargazers:152Issues:6Issues:7

TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

MultiBooth

[arXiv 2024] MultiBooth: This repo is the official implementation of "MultiBooth: Towards Generating All Your Concepts in an Image from Text"

VL-InterpreT

Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers

Language:PythonLicense:MITStargazers:83Issues:8Issues:2
Language:PythonLicense:Apache-2.0Stargazers:77Issues:4Issues:6

MiCo

Explore the Limits of Omni-modal Pretraining at Scale

Language:PythonLicense:Apache-2.0Stargazers:74Issues:2Issues:6
Language:PythonLicense:Apache-2.0Stargazers:56Issues:6Issues:9
Language:PythonLicense:BSD-3-ClauseStargazers:27Issues:1Issues:5
Language:PythonLicense:Apache-2.0Stargazers:23Issues:1Issues:5
Language:PythonLicense:MITStargazers:20Issues:0Issues:0