zero is not none's starred repositories

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:19320Issues:170Issues:329

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:17661Issues:158Issues:1363

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

IC-Light

More relighting!

Language:PythonLicense:Apache-2.0Stargazers:3960Issues:41Issues:58

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:3686Issues:38Issues:259

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。

Language:TypeScriptLicense:GPL-3.0Stargazers:3220Issues:27Issues:98

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3061Issues:25Issues:125

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonLicense:Apache-2.0Stargazers:3049Issues:29Issues:947

swift

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, Deepseek, Baichuan2...)

Language:PythonLicense:Apache-2.0Stargazers:2050Issues:19Issues:566

T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:1966Issues:37Issues:68

gligen-gui

An intuitive GUI for GLIGEN that uses ComfyUI in the backend

Language:JavaScriptLicense:NOASSERTIONStargazers:1926Issues:14Issues:35

PytorchOCR

基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Language:PythonLicense:MITStargazers:772Issues:35Issues:21

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonLicense:NOASSERTIONStargazers:410Issues:18Issues:9

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Dino_V2

Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193

Language:Jupyter NotebookStargazers:129Issues:3Issues:4

attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Language:Jupyter NotebookLicense:MITStargazers:96Issues:5Issues:4

json-repair

🔧 Repair JSON!Solution for JSON Anomalies from LLMs.

Language:GoLicense:GPL-3.0Stargazers:81Issues:2Issues:6

Composition-Stable-Diffusion

Image Composition via Stable Diffusion

Language:PythonLicense:MITStargazers:64Issues:7Issues:2
Language:PythonLicense:Apache-2.0Stargazers:53Issues:3Issues:4

Document-Layout-Analysis

Object Detection Model for Scanned Documents

Language:Jupyter NotebookLicense:MITStargazers:51Issues:3Issues:2

General-Documents-Layout-parser

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Language:PythonLicense:NOASSERTIONStargazers:36Issues:2Issues:1

color-clustering

A tool to perform K-means clustering analysis of the colors in an image.

Language:PythonLicense:Apache-2.0Stargazers:21Issues:2Issues:2

clic

CLiC: Concept Learning in Context

Scence-Text-Recognition-With-YOLOv8-and-CRNN

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Language:Jupyter NotebookStargazers:3Issues:0Issues:0