Wendong Gan's repositories
async_cosyvoice
使用vllm加速cosyvoice2的推理
audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
CarelessWhisper-Streaming
Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Cosyvoice_DPO_NOTES
CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!
F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
fireredasr-streaming
low-latency realtime ASR based on FireRedASR
FluidAudio
Fully Native Swift and CoreML. Efficient Speaker Diarization, VAD, and Speech-to-Text for realtime workloads
GenVC
Self-supervised Generative LM-based Voice Conversion
GTSinger
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
happy-llm
📚 从零开始的大语言模型原理与实践教程
litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
mamba-diarization
Official repository for Mamba-based Segmentation Model for Speaker Diarization
minimind
「大模型」3小时完全从0训练26M的小参数GPT,个人显卡即可推理训练!
reverb
Open source inference code for Rev's model
scoreq
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
speaker_disentangled_hubert
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
SSR-Speech
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
train-higgs-audio-jimmyMa99
Text-audio foundation model from Boson AI
TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
WavChat
A Survey of Spoken Dialogue Models (60 pages)
wavesurfer
For audio visualization and playback in Jupyter notebooks.
WenetSpeech-Yue
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation