Chunyu Wang's starred repositories

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonLicense:Apache-2.0Stargazers:31734Issues:165Issues:4635

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonLicense:Apache-2.0Stargazers:31234Issues:309Issues:902

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++License:Apache-2.0Stargazers:9984Issues:124Issues:739

PhotoMaker

PhotoMaker [CVPR 2024]

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:9220Issues:103Issues:149

lora

Using Low-rank adaptation to quickly fine-tune diffusion models.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:6910Issues:59Issues:138

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:6079Issues:37Issues:292

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5901Issues:47Issues:78

Wonder3D

Single Image to 3D using Cross-Domain Diffusion for 3D Generation

Language:PythonLicense:AGPL-3.0Stargazers:4614Issues:49Issues:172

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

Language:PythonLicense:MITStargazers:4349Issues:73Issues:240

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonLicense:BSD-3-ClauseStargazers:3080Issues:69Issues:311

DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)

Language:PythonLicense:Apache-2.0Stargazers:2129Issues:28Issues:91

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1963Issues:19Issues:46

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonLicense:MITStargazers:1952Issues:37Issues:82

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:1633Issues:26Issues:49

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1593Issues:21Issues:85

composer

Official implementation of "Composer: Creative and Controllable Image Synthesis with Composable Conditions"

TimeSformer

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Language:PythonLicense:NOASSERTIONStargazers:1502Issues:28Issues:128

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonLicense:Apache-2.0Stargazers:1244Issues:50Issues:31

CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Language:PythonLicense:MITStargazers:1026Issues:14Issues:18

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonLicense:NOASSERTIONStargazers:923Issues:113Issues:36

SyncDreamer

[ICLR 2024 Spotlight] SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Language:PythonLicense:MITStargazers:861Issues:23Issues:66

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonLicense:Apache-2.0Stargazers:670Issues:12Issues:100

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonLicense:MITStargazers:476Issues:9Issues:32

MVDiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)

objaverse-rendering

📷 Scripts for rendering Objaverse

Language:PythonLicense:Apache-2.0Stargazers:204Issues:8Issues:16

Pro-Motion

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

MVGFormer

This is the official implementation of the work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation (MVGFormer).

Language:PythonLicense:Apache-2.0Stargazers:24Issues:6Issues:2
Language:PythonLicense:MITStargazers:14Issues:3Issues:5