Yunlin Chen's starred repositories
syncnet_python
Out of time: automated lip sync in the wild
VideoMamba
VideoMamba: State Space Model for Efficient Video Understanding
InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
speech-datasets-collection
a curated list of speech datasets (110+ datasets, 75+ easy to download)
parler-tts
Inference and training library for high-quality TTS models.
vqvae-vqgan-pytorch-lightning
VQ-VAE/GAN implementation in pytorch-lightning
taming-transformers
Taming Transformers for High-Resolution Image Synthesis
all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
StableVITON
[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
DynamiCrafter
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".