Wendong Gan (WendongGan)

WendongGan

User data from Github https://github.com/WendongGan

Company:UESTC

Location:Chengdu,China

GitHub:@WendongGan

Wendong Gan's repositories

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

async_cosyvoice

使用vllm加速cosyvoice2的推理

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

CarelessWhisper-Streaming

Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Cosyvoice_DPO_NOTES

CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!

Language:PythonStargazers:0Issues:0Issues:0

F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

fireredasr-streaming

low-latency realtime ASR based on FireRedASR

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

FluidAudio

Fully Native Swift and CoreML. Efficient Speaker Diarization, VAD, and Speech-to-Text for realtime workloads

Language:SwiftLicense:Apache-2.0Stargazers:0Issues:0Issues:0

GenVC

Self-supervised Generative LM-based Voice Conversion

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

GTSinger

Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

happy-llm

📚 从零开始的大语言模型原理与实践教程

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mamba-diarization

Official repository for Mamba-based Segmentation Model for Speaker Diarization

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

minimind

「大模型」3小时完全从0训练26M的小参数GPT,个人显卡即可推理训练!

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

reverb

Open source inference code for Rev's model

License:NOASSERTIONStargazers:0Issues:0Issues:0

scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Language:PythonStargazers:0Issues:0Issues:0

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

TextrolSpeech

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

train-higgs-audio-jimmyMa99

Text-audio foundation model from Boson AI

Language:PythonStargazers:0Issues:0Issues:0

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

WavChat

A Survey of Spoken Dialogue Models (60 pages)

Stargazers:0Issues:0Issues:0

wavesurfer

For audio visualization and playback in Jupyter notebooks.

License:BSD-2-ClauseStargazers:0Issues:0Issues:0

WenetSpeech-Yue

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

License:Apache-2.0Stargazers:0Issues:0Issues:0