iamlockelightning

followers

following

stars

iamlockelightning.github.io

Chengjiang's starred repositories

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.020777 177 390

PhotoMaker

PhotoMaker

Language:Jupyter NotebookNOASSERTION8684 97 125

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

LWM

Language:PythonApache-2.07018 66 67

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5712 46 75

moondream

tiny vision language model

Language:Jupyter NotebookApache-2.04564 54 98

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonApache-2.04445 52 136

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.03322 24 419

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION2622 36 133

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT1884 18 43

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

Language:Python1110 14 13

wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

NOASSERTION975 39 6

MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Language:PythonApache-2.0889 21 49

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookMIT852 12 24

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0799 21 93

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

Language:PythonApache-2.0714 10 96

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

Language:PythonNOASSERTION667 20 5

animate-anything

Fine-Grained Open Domain Image Animation with Motion Guidance

Language:PythonMIT654 16 52

fast-DiT

Fast Diffusion Models with Transformers

Language:PythonNOASSERTION623 7 11

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookNOASSERTION446 17 31

maskgit

Official Jax Implementation of MaskGIT

Language:Jupyter NotebookApache-2.0405 17 12

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Language:PythonApache-2.0359 17 21

VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Language:Python337 23 6

TaiSu

TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）

Language:PythonNOASSERTION171 3 9

MMBench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

Apache-2.0118 3 28

snip-dedup

Language:PythonMIT98 3 7

M2PT

[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Language:PythonApache-2.083 8 2

GVT

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".

Language:PythonApache-2.054 7 8

wikiHow-VGSI

EMNLP 2021: Visual Goal-Step Inference using wikiHow

MIT12 2 1

encode-once-and-decode-in-parallel