zhangshushu15's starred repositories
Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
AnimateDiff
Official implementation of AnimateDiff.
Mr.-Ranedeer-AI-Tutor
A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.
consistency_models
Official repo for consistency models.
conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
style2paints
sketch + style = paints :art: (TOG2018/SIGGRAPH2018ASIA)
ControlNet-v1-1-nightly
Nightly release of ControlNet 1.1
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Mubert-Text-to-Music
A simple notebook demonstrating prompt-based music generation via Mubert API
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
generative-models
Generative Models by Stability AI
GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
openai-python
The official Python library for the OpenAI API
ColossalAI
Making large AI models cheaper, faster and more accessible