ABC0408

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python461 13 23

python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing models (primarily trained by @Anjok07 as part of UVR)

Language:PythonMIT261 7 62

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonMIT261 26 12

CPED

CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | 中文个性情感对话数据集

Language:PythonApache-2.0184 4 6

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonApache-2.0147 6 6

AudioEditingCode

Language:PythonCC-BY-SA-4.0121 4 3

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:Python108 4 4

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CBSD-3-Clause93 5 4

FAcodec

Training code for FAcodec presented in NaturalSpeech3

Language:Python89 8 5

supervoice

VoiceBox neural network implementation

Language:Jupyter Notebook71 11 11

OpenPhonemizer

Permissively licensed, open sourced, local IPA Phonemizer (G2P) powered by deep learning.

Language:PythonBSD-3-Clause-Clear70 4 5

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonApache-2.065 110

DTTNet-Pytorch

An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation

Language:PythonApache-2.061 4 2

pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

Language:Python61 5 4

X-E-Speech-code

X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion

Language:PythonMIT60 8 4

hilcodec

High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec

Language:Jupyter NotebookMIT5500

g2p-mix

Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English

Language:PythonMIT5100

LangSegment

It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言（97种语言）混合文本内容自动分词工具。

Language:Python41 2 15

FlashSpeech

FlashSpeech: Efficient Zero-Shot Speech Synthesis

3800

Train_Hifigan_XTTS

This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.

Language:Python2800

speechtoolkit

[EARLY PUBLIC ALPHA] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activity detection, and more!

Language:Python19 40

Lightvoc

LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM

Language:Jupyter Notebook1600

xcodec

X-Codec: Unified Audio Tokenizer for Audio Language Model

1400