CHUNYUWANG

Chunyu Wang's starred repositories

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonApache-2.031734 165 4635

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonApache-2.031234 309 902

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.09984 124 739

PhotoMaker

PhotoMaker [CVPR 2024]

Language:Jupyter NotebookNOASSERTION9220 103 149

lora

Using Low-rank adaptation to quickly fine-tune diffusion models.

Language:Jupyter NotebookApache-2.06910 59 138

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.06079 37 292

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5901 47 78

Wonder3D

Single Image to 3D using Cross-Domain Diffusion for 3D Generation

Language:PythonAGPL-3.04614 49 172

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

Language:PythonMIT4349 73 240

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonBSD-3-Clause3080 69 311

DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)

Language:PythonApache-2.02129 28 91

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT1963 19 46

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonMIT1952 37 82

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter NotebookAGPL-3.01633 26 49

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonApache-2.01593 21 85

composer

Official implementation of "Composer: Creative and Controllable Image Synthesis with Composable Conditions"

MIT1533 173 8

TimeSformer

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Language:PythonNOASSERTION1502 28 128

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonApache-2.01244 50 31

CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Language:PythonMIT1026 14 18

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonNOASSERTION923 113 36

SyncDreamer

[ICLR 2024 Spotlight] SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Language:PythonMIT861 23 66

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonApache-2.0670 12 100

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonMIT476 9 32

MVDiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)

Language:Python471 23 48

objaverse-rendering

📷 Scripts for rendering Objaverse

Language:PythonApache-2.0204 8 16

Pro-Motion

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

39 5 1

ART.V

34 12 2

MVGFormer

This is the official implementation of the work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation (MVGFormer).

Apache-2.027 5 5

BannerGen

Language:PythonApache-2.024 6 2

SBT

Language:PythonMIT14 3 5