UESTCYangHR

followers

following

stars

PeterYoung's starred repositories

titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Language:PythonMIT11200

AutoStudio

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

Language:Jupyter Notebook9200

megactor

Language:PythonApache-2.022500

BasicSR

Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.

Language:PythonApache-2.0639800

CleanDiffuser

CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

Language:PythonApache-2.012100

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonApache-2.088400

textgrad

Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonMIT53900

BERT-pytorch

Google AI 2018 BERT pytorch implementation

Language:PythonApache-2.0605800

vision-agent

Vision agent

Language:PythonApache-2.072700

CustomTkinter

A modern and customizable python UI-library based on Tkinter

Language:PythonMIT1067000

pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Language:PythonApache-2.02736300

Glyph-ByT5

This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""

Language:Jupyter Notebook30000

LLaVA-Magvit2

Language:Python2300

wtfpython

What the f*ck Python? 😱

Language:PythonWTFPL3543400

hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Language:PythonMIT393500

MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Language:PythonNOASSERTION170300

bsq-vit

[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization

Language:PythonMIT4500

mdlm

Simplified Masked Diffusion Language Model

Language:PythonApache-2.07900

image-textualization

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

Language:Python5000

AsyncDiff

Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"

Language:PythonApache-2.06900

LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

9100

Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

Language:PythonMIT2048600

Retrieval-based-Voice-Conversion

in preparation...

Language:PythonMIT18100

omniglue

Code release for CVPR'24 submission 'OmniGlue'

Language:PythonApache-2.041700

Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Language:PythonBSD-3-Clause2677500

MQT-LLaVA

Matryoshka Query Transformer for Large Vision-Language Models

Language:PythonApache-2.06900

cobalt

save what you love

Language:JavaScriptAGPL-3.0972800

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT79400

BIRD

This is the official implementation of "Blind Image Restoration via Fast Diffusion Inversion"

Language:Python20600

flash-diffusion

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Language:PythonNOASSERTION23500