zero is not none's starred repositories

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18015Issues:157Issues:1382

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

IC-Light

More relighting!

Language:PythonLicense:Apache-2.0Stargazers:4190Issues:44Issues:62

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:3990Issues:40Issues:298

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。

Language:TypeScriptLicense:GPL-3.0Stargazers:3321Issues:28Issues:100

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonLicense:Apache-2.0Stargazers:3226Issues:30Issues:998

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3078Issues:25Issues:126

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2261Issues:18Issues:615

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:1990Issues:36Issues:72

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1466Issues:17Issues:14

PytorchOCR

基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Language:PythonLicense:MITStargazers:775Issues:35Issues:22

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonLicense:NOASSERTIONStargazers:464Issues:18Issues:11

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

json-repair

🔧 Repair JSON!Solution for JSON Anomalies from LLMs.

Language:GoLicense:GPL-3.0Stargazers:133Issues:2Issues:9

Dino_V2

Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193

Language:Jupyter NotebookStargazers:132Issues:3Issues:4

attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Language:Jupyter NotebookLicense:MITStargazers:99Issues:5Issues:4

Composition-Stable-Diffusion

Image Composition via Stable Diffusion

Language:PythonLicense:MITStargazers:65Issues:7Issues:2

Document-Layout-Analysis

Object Detection Model for Scanned Documents

Language:Jupyter NotebookLicense:MITStargazers:57Issues:3Issues:2
Language:PythonLicense:Apache-2.0Stargazers:54Issues:3Issues:4

General-Documents-Layout-parser

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Language:PythonLicense:NOASSERTIONStargazers:36Issues:3Issues:1

InstructionGPT-4

InstructionGPT-4

Language:PythonLicense:MITStargazers:35Issues:1Issues:0

color-clustering

A tool to perform K-means clustering analysis of the colors in an image.

Language:PythonLicense:Apache-2.0Stargazers:21Issues:2Issues:2

clic

CLiC: Concept Learning in Context

Scence-Text-Recognition-With-YOLOv8-and-CRNN

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Language:Jupyter NotebookStargazers:3Issues:1Issues:0