DWCTOD

zero is not none's starred repositories

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.019320 170 329

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.017661 158 1363

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

10188 237 98

IC-Light

More relighting!

Language:PythonApache-2.03960 41 58

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT3686 38 259

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。

Language:TypeScriptGPL-3.03220 27 98

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.03061 25 125

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.03049 29 947

Diffusion-Models-Papers-Survey-Taxonomy

Diffusion model papers, survey, and taxonomy

2779 53 7

swift

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, Deepseek, Baichuan2...)

Language:PythonApache-2.02050 19 566

T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION1966 37 68

gligen-gui

An intuitive GUI for GLIGEN that uses ComfyUI in the backend

Language:JavaScriptNOASSERTION1926 14 35

PytorchOCR

基于Pytorch的OCR工具库，支持常用的文字检测和识别算法

Language:Python1314 23 181

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Language:Python1028 14 40

LLaVA-NeXT

Language:Python839 19 60

TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Language:PythonMIT772 35 21

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

MIT725 43 10

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonNOASSERTION410 18 9

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Language:Python259 5 12

General-World-Models-Survey

MIT189 30

Dino_V2

Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193

Language:Jupyter Notebook129 3 4

attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Language:Jupyter NotebookMIT96 5 4

json-repair

🔧 Repair JSON！Solution for JSON Anomalies from LLMs.

Language:GoGPL-3.081 2 6

Composition-Stable-Diffusion

Image Composition via Stable Diffusion

Language:PythonMIT64 7 2

XmodelVLM

Language:PythonApache-2.053 3 4

Document-Layout-Analysis

Object Detection Model for Scanned Documents

Language:Jupyter NotebookMIT51 3 2

General-Documents-Layout-parser

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Language:PythonNOASSERTION36 2 1

color-clustering

A tool to perform K-means clustering analysis of the colors in an image.

Language:PythonApache-2.021 2 2

clic

CLiC: Concept Learning in Context

5 3 1

Scence-Text-Recognition-With-YOLOv8-and-CRNN

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Language:Jupyter Notebook300