jshilong

Shilong Zhang's starred repositories

Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Language:PythonBSD-3-Clause27167 232 656

magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Language:PythonBSD-3-Clause10263 102 143

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonApache-2.05709 66 408

StoryDiffusion

Create Magic Story!

Language:Jupyter NotebookApache-2.05579 85 130

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonNOASSERTION4375 70 74

AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

Language:PythonMIT3849 86 94

StableSR

[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution

Language:PythonNOASSERTION2008 23 136

MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Language:PythonNOASSERTION1970 41 55

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT1934 29 78

awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Language:Python1608 27 5

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonApache-2.01576 21 85

Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

1566 47 13

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT1083 19 41

Awesome-Video-Datasets

Video datasets

1059 24 10

MimicBrush

Official implementations for paper: Zero-shot Image Editing with Reference Imitation

Language:PythonApache-2.0957 13 16

Upscale-A-Video

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

881 86 10

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

MIT787 46 12

rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Language:PythonMIT774 7 33

Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.0498 35 20

Pandora

Pandora: Towards General World Model with Natural Language Actions and Video States

Language:Python440 17 7

streamv2v

Official Pytorch implementation of StreamV2V.

Language:PythonNOASSERTION406 8 6

FlashFace

Language:PythonMIT304 13 13

PoseAnything

A Graph-Based Approach for Category-Agnostic Pose Estimation [ECCV 2024]

Language:PythonApache-2.0285 4 10

TalkSHOW

This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].

Language:Python278 12 29

nosmpl

Accelerated SMPL operation, commonly used in generate 3D human mesh, STAR included.

Language:PythonGPL-3.0125 3 15

Aurora

Official implementation of Aurora

Language:PythonNOASSERTION80 7 1

DAC-DETR

[NIPS2023] This is an official implementation of paper "DAC-DETR: Divide the Attention Layers and Conquer".

Language:PythonMIT52 1 5

ComfyUI-FlashFace

ComfyUI Node for FlashFace

Language:PythonMIT41 1 17

gradio-box

Language:PythonApache-2.015 2 2

timing_experiments

Language:Python3 1 1