DWCTOD

zero is not none's starred repositories

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.018015 157 1382

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

10502 241 101

IC-Light

More relighting!

Language:PythonApache-2.04190 44 62

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT3990 40 298

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。

Language:TypeScriptGPL-3.03321 28 100

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.03226 30 998

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.03078 25 126

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.02261 18 615

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION1990 36 72

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.01466 17 14

PytorchOCR

基于Pytorch的OCR工具库，支持常用的文字检测和识别算法

Language:Python1323 23 181

LLaVA-NeXT

Language:Python1111 21 86

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Language:Python1036 14 40

TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Language:PythonMIT775 35 22

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonNOASSERTION464 18 11

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

459 18 3

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

298 5 14

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language:Python239 10 8

General-World-Models-Survey

MIT205 120

json-repair

🔧 Repair JSON！Solution for JSON Anomalies from LLMs.

Language:GoGPL-3.0133 2 9

Dino_V2

Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193

Language:Jupyter Notebook132 3 4

attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Language:Jupyter NotebookMIT99 5 4

Composition-Stable-Diffusion

Image Composition via Stable Diffusion

Language:PythonMIT65 7 2

Document-Layout-Analysis

Object Detection Model for Scanned Documents

Language:Jupyter NotebookMIT57 3 2

XmodelVLM

Language:PythonApache-2.054 3 4

General-Documents-Layout-parser

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Language:PythonNOASSERTION36 3 1

InstructionGPT-4

Language:PythonMIT35 10

color-clustering

A tool to perform K-means clustering analysis of the colors in an image.

Language:PythonApache-2.021 2 2

clic

CLiC: Concept Learning in Context

6 3 1

Scence-Text-Recognition-With-YOLOv8-and-CRNN

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Language:Jupyter Notebook3 10