Chengjiang's starred repositories

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:20777Issues:177Issues:390

PhotoMaker

PhotoMaker

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:8684Issues:97Issues:125

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Language:PythonLicense:Apache-2.0Stargazers:7018Issues:66Issues:67

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5712Issues:46Issues:75

moondream

tiny vision language model

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4564Issues:54Issues:98

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4445Issues:52Issues:136

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3322Issues:24Issues:419

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2622Issues:36Issues:133

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1884Issues:18Issues:43

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Language:PythonLicense:Apache-2.0Stargazers:889Issues:21Issues:49

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookLicense:MITStargazers:852Issues:12Issues:24

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:799Issues:21Issues:93

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:714Issues:10Issues:96

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

Language:PythonLicense:NOASSERTIONStargazers:667Issues:20Issues:5

animate-anything

Fine-Grained Open Domain Image Animation with Motion Guidance

Language:PythonLicense:MITStargazers:654Issues:16Issues:52

fast-DiT

Fast Diffusion Models with Transformers

Language:PythonLicense:NOASSERTIONStargazers:623Issues:7Issues:11

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:446Issues:17Issues:31

maskgit

Official Jax Implementation of MaskGIT

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:405Issues:17Issues:12

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Language:PythonLicense:Apache-2.0Stargazers:359Issues:17Issues:21

VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

TaiSu

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Language:PythonLicense:NOASSERTIONStargazers:171Issues:3Issues:9

MMBench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

Language:PythonLicense:MITStargazers:98Issues:3Issues:7

M2PT

[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Language:PythonLicense:Apache-2.0Stargazers:83Issues:8Issues:2

GVT

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".

Language:PythonLicense:Apache-2.0Stargazers:54Issues:7Issues:8

wikiHow-VGSI

EMNLP 2021: Visual Goal-Step Inference using wikiHow