WendongGan

Wendong Gan's repositories

SoloAudio

Language:PythonMIT100

async_cosyvoice

使用vllm加速cosyvoice2的推理

Language:Jupyter NotebookApache-2.0000

audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Language:PythonMIT000

CarelessWhisper-Streaming

Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.

Language:PythonNOASSERTION000

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0000

Cosyvoice_DPO_NOTES

CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!

Language:Python000

F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Language:PythonMIT000

fireredasr-streaming

low-latency realtime ASR based on FireRedASR

Language:PythonMIT000

FluidAudio

Fully Native Swift and CoreML. Efficient Speaker Diarization, VAD, and Speech-to-Text for realtime workloads

Language:SwiftApache-2.0000

GenVC

Self-supervised Generative LM-based Voice Conversion

Language:PythonMIT000

GTSinger

Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Language:PythonNOASSERTION000

happy-llm

📚 从零开始的大语言模型原理与实践教程

Language:Jupyter NotebookNOASSERTION000

icefall

Language:PythonApache-2.0000

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonApache-2.0000

mair-hub

Language:Jupyter NotebookApache-2.0000

mamba-diarization

Official repository for Mamba-based Segmentation Model for Speaker Diarization

Language:PythonNOASSERTION000

minimind

「大模型」3小时完全从0训练26M的小参数GPT，个人显卡即可推理训练！

Language:PythonApache-2.0000

reverb

Open source inference code for Rev's model

NOASSERTION000

scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language:Python000

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonMIT000

SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

Language:PythonMIT000

speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Language:PythonMIT000

SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Language:PythonMIT000

super-monotonic-align

Language:PythonMIT000

TextrolSpeech

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)

Language:PythonMIT000

train-higgs-audio-jimmyMa99

Text-audio foundation model from Boson AI

Language:Python000

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonApache-2.0000

WavChat

A Survey of Spoken Dialogue Models (60 pages)

000

wavesurfer

For audio visualization and playback in Jupyter notebooks.

BSD-2-Clause000

WenetSpeech-Yue

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Apache-2.0000