bobo0810

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION6923 64 1175

LeetCode-Book

《剑指 Offer》 Python, Java, C++ 解题代码，LeetBook《图解算法数据结构》配套代码仓

Language:JavaNOASSERTION6310 49 7

sglang

SGLang is a fast serving framework for large language models and vision language models.

Language:PythonApache-2.06017 57 629

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonMIT5991 52 605

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.04654 39 454

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.02981 28 185

LLaVA-NeXT

Language:PythonApache-2.02852 33 299

cv_note

记录cv算法工程师的成长之路，分享计算机视觉和模型压缩部署技术栈笔记。https://harleyszhang.github.io/cv_note/

Language:PythonApache-2.02417 31 4

ReplaceAnything

2367 126 20

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT2071 19 47

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.01979 24 92

LVM

Language:PythonApache-2.01764 120 22

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Language:PythonApache-2.01321 10 209

MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Language:PythonApache-2.01038 21 57

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0930 19 120

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT719 15 62

MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Language:PythonBSD-3-Clause553 12 40

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:Python383 15 53

VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Language:Python365 23 6

NeteaseTVDemo

NeteaseTVDemo (Vibefy) - tvOS 客户端

Language:SwiftGPL-2.0271 7 16

ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Language:PythonApache-2.0244 11 11

awesome-mm-chat

多模态 MM +Chat 合集

Language:Python204 6 1

DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Language:Python116 4 5

DataOptim

A collection of visual instruction tuning datasets.

Language:PythonMIT76 50