Zhenhua Yang (yeungchenwa)

yeungchenwa

Geek Repo

Company:@SCUT-DLVCLab @HCIILAB

Location:Guangzhou, China

Home Page:eezhyang@gmail.com

Github PK Tool:Github PK Tool


Organizations
SCUT-DLVCLab

Zhenhua Yang's starred repositories

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21444Issues:179Issues:454

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:7147Issues:44Issues:75

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonLicense:MITStargazers:3545Issues:100Issues:160

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2670Issues:37Issues:134
Language:PythonLicense:Apache-2.0Stargazers:2173Issues:32Issues:171

VLM_survey

Collection of AWESOME vision-language models for vision tasks

Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

MambaOut

MambaOut: Do We Really Need Mamba for Vision?

Language:PythonLicense:Apache-2.0Stargazers:1944Issues:6Issues:243

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1167Issues:21Issues:52

RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

Awesome-LLM4AD

A curated list of awesome LLM for Autonomous Driving resources (continually updated)

VisionLLM

VisionLLM Series

Language:PythonLicense:Apache-2.0Stargazers:820Issues:42Issues:13

alphafold3-pytorch

Implementation of Alphafold 3 in Pytorch

Language:PythonLicense:MITStargazers:819Issues:40Issues:28

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Language:PythonLicense:Apache-2.0Stargazers:690Issues:11Issues:35

Campus2025

2025届互联网校招信息汇总

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonLicense:Apache-2.0Stargazers:685Issues:10Issues:66

GenerativeImage2Text

GIT: A Generative Image-to-text Transformer for Vision and Language

Language:PythonLicense:MITStargazers:540Issues:9Issues:58

DriveAGI

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System

Language:PythonLicense:Apache-2.0Stargazers:521Issues:27Issues:7

Awesome-World-Model

Collect some World Models for Autonomous Driving papers.

Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Language:PythonLicense:Apache-2.0Stargazers:355Issues:22Issues:24

LayerDiffuse_DiffusersCLI

LayerDiffuse in pure diffusers without any GUI

Language:PythonLicense:Apache-2.0Stargazers:288Issues:7Issues:9

TexTeller

TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.

Language:PythonLicense:Apache-2.0Stargazers:278Issues:3Issues:9

DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Language:PythonLicense:MITStargazers:263Issues:6Issues:9

mmdit

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Language:PythonLicense:MITStargazers:219Issues:3Issues:1

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonLicense:NOASSERTIONStargazers:204Issues:2Issues:8

nxtp

Object Recognition as Next Token Prediction (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:147Issues:2Issues:5

VimTS

VimTS: A Unified Video and Image Text Spotter

Language:PythonLicense:GPL-3.0Stargazers:69Issues:2Issues:5

UPOCR

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Language:PythonStargazers:35Issues:0Issues:0

MegaHan97K

MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories

Language:PythonStargazers:4Issues:0Issues:0