Beast code in Giters

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION587900

awesome-multi-agent-papers

A compilation of the best multi-agent papers

15900

HolmesVAD

Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"

Language:PythonMIT6200

LocLLM

Code for "LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model", CVPR 2024 Highlight

Language:PythonMIT2600

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION475300

Pink

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

Language:Python7200

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonApache-2.01178400

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonGPL-3.01630900

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python63900

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1166800

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0195300

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION4499400

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0114800

liury889

liury889's starred repositories

mmpose

LLM101n

ChatDev

XAgent

tutorials

llama3

peft

LoRA

S-LoRA

FunASR

awesome-multi-agent-papers

HolmesVAD

LocLLM

Qwen-VL

Pink

MiniCPM-V

translation-agent

marker

anole

Awesome-Multimodal-Large-Language-Models

CogVLM2

dify

Video-ChatGPT

LWM

MoE-LLaVA

GroundingGPT

Chat-UniVi

Video-LLaVA

grok-1

VideoCrafter