symao's repositories
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
AnimateDiff
Official implementation of AnimateDiff.
annotated_deep_learning_paper_implementations
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
bark
🔊 Text-Prompted Generative Audio Model
bark-voice-cloning-HuBERT-quantizer
The code for the bark-voicecloning model. Training and inference.
Bert-VITS2
vits2 backbone with multilingual-bert
Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
conditional-flow-matching
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
it3103
it3103 code repo for students
jukebox
Code for the paper "Jukebox: A Generative Model for Music"
kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Large-Audio-Models
Keep track of big models in audio domain, including speech, singing, music etc.
Latte
Latte: Latent Diffusion Transformer for Video Generation.
leetcode-master
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Matcha-TTS
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
taming-transformers
Taming Transformers for High-Resolution Image Synthesis
TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
wespeaker
Research and Production Oriented Speaker Recognition Toolkit