lmxue

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonApache-2.07513 96 1524

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION7343 89 120

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION7122 61 302

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.07066 63 147

ComfyUI-Workflows-ZHO

我的 ComfyUI 工作流合集 | My ComfyUI workflows collection

GPL-3.04474 37 10

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonApache-2.04414 62 177

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT4377 57 146

ThinkDSP

Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

Language:Jupyter Notebook3875 236 57

resemble-enhance

AI powered speech denoising and enhancement

Language:PythonMIT1183 16 39

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonMIT1146 57 50

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT1074 26 72

lhotse

Tools for handling speech data in machine learning projects.

Language:PythonApache-2.0914 44 406

WavCraft

Official repo for WavCraft, an AI agent for audio creation and editing

Language:PythonNOASSERTION648 71 1

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python550 14 38

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

543 30 2

VoiceFlow-TTS

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Language:Python272 16 13

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

Language:PythonMIT249 28 6

frechet-audio-distance

A lightweight library for Frechet Audio Distance calculation.

Language:PythonMIT224 2 12

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:Python196 12 17

openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

Language:RustMIT131 6 14

SpeechTasks

This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.

72 30

VoicePAT

VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.

Language:ShellApache-2.046 5 5

Open-Suno

trying to reproduce suno v3

MIT23 30

tarzan

High-level API for tar-based dataset

Language:Python10 30