attitudechunfeng

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python475 13 27

UniAudio

The Open Source Code of UniAudio

Language:Python467 39 27

REAPER

Language:C++Apache-2.0380 36 13

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonMIT292 16 42

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonMIT271 26 12

megatts2

Unoffical implementation of Megatts2

Language:PythonMIT234 22 19

SPTK

A suite of speech signal processing tools

Language:C++Apache-2.0214 17 6

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonApache-2.0151 6 6

tts-scores

Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models

Language:PythonApache-2.0119 5 12

USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Language:Python115 8 4

Bridge-TTS

Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).

MIT115 40 4

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:Python111 4 4

HiFTNet

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Language:PythonMIT111 11 7

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CBSD-3-Clause94 5 4

AQUA-Tk

AQUA-Tk = Audio QUality Assessment-Toolkit. (In development)

Language:PythonGPL-3.089 3 3

UniAudio

The official source code of UniAudio

Language:Python73 8 1

PhoneLM

(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.

Language:Jupyter NotebookMIT45 90

ChildAugment

Codes for LPC Segmental Warping Perturbations (LPC-SWP) and Formant Energy Bandwidth (FEP-BWP) Perturbations

Language:Python300