HuaZheLei's starred repositories

books

【编程随想】收藏的电子书清单(多个学科,含下载链接)

BELLE

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

Language:HTMLLicense:Apache-2.0Stargazers:7628Issues:107Issues:436

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5330Issues:45Issues:73

MiniCPM

MiniCPM-2B: An end-side LLM outperforms Llama2-13B.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4082Issues:52Issues:111

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:2876Issues:34Issues:182

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonLicense:BSD-3-ClauseStargazers:2863Issues:71Issues:300

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2322Issues:39Issues:0

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonLicense:MITStargazers:1984Issues:30Issues:97

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1517Issues:21Issues:83

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonLicense:Apache-2.0Stargazers:1354Issues:25Issues:74

style-aligned

Official code for "Style Aligned Image Generation via Shared Attention"

Language:PythonLicense:Apache-2.0Stargazers:1094Issues:23Issues:23

minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Language:PythonLicense:Apache-2.0Stargazers:1059Issues:16Issues:62

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1029Issues:20Issues:51

improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Language:PythonLicense:Apache-2.0Stargazers:732Issues:6Issues:10

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:699Issues:19Issues:75

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

taesd

Tiny AutoEncoder for Stable Diffusion

Language:PythonLicense:MITStargazers:429Issues:10Issues:15

aesthetic-predictor

A linear estimator on top of clip to predict the aesthetic quality of pictures

Language:Jupyter NotebookLicense:MITStargazers:394Issues:12Issues:6

edm2

Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)

Language:PythonLicense:NOASSERTIONStargazers:333Issues:8Issues:3

HPT

HPT - Open Multimodal LLMs from HyperGAI

Language:PythonLicense:Apache-2.0Stargazers:289Issues:6Issues:6

Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Language:PythonLicense:Apache-2.0Stargazers:262Issues:5Issues:28

DriveDreamer

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

WorldDreamer

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

VL-GPT

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Uniaa

Unified Multi-modal IAA Baseline and Benchmark

AIGCBench

Official repo for AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

Language:PythonLicense:Apache-2.0Stargazers:24Issues:0Issues:0

TransCore-M

Large Multimodal Model