Chenxi's repositories
insanely-fast-whisper
Incredibly fast Whisper-large-v3
Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
VideoCrafter
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
ScaleCrafter
Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.
T2I-Adapter
T2I-Adapter
chenxwh.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
ProphetNet
A research project for natural language generation, containing the official implementations by MSRA NLC team.
Wuerstchen
Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
cog-whisperv3
Run OpenAI Whisper as a Cog model
daclip-uir
PyTorch implementation of the paper "Controlling Vision-Language Models for Universal Image Restoration". Currently aiming for *academic researches* 😋
InstructCV
Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
LongerCrafter
Code for FreeNoise
LongLoRA
Efficient long-context fine-tuning, supervised fine-tuning, LongQA dataset.
Magic123
Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"