Лэюань 's starred repositories

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:22448Issues:147Issues:248

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:6992Issues:87Issues:104

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonLicense:Apache-2.0Stargazers:5891Issues:70Issues:219

piper

A fast, local neural text to speech system

Language:C++License:MITStargazers:4660Issues:66Issues:392

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonLicense:Apache-2.0Stargazers:3254Issues:70Issues:107

FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Language:PythonLicense:MITStargazers:2310Issues:22Issues:51

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonLicense:MITStargazers:1366Issues:34Issues:241

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing models (primarily trained by @Anjok07 as part of UVR)

Language:PythonLicense:MITStargazers:261Issues:7Issues:62

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonLicense:MITStargazers:261Issues:26Issues:12

CPED

CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | 中文个性情感对话数据集

Language:PythonLicense:Apache-2.0Stargazers:184Issues:4Issues:6

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonLicense:Apache-2.0Stargazers:147Issues:6Issues:6
Language:PythonLicense:CC-BY-SA-4.0Stargazers:121Issues:4Issues:3

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CLicense:BSD-3-ClauseStargazers:93Issues:5Issues:4

FAcodec

Training code for FAcodec presented in NaturalSpeech3

supervoice

VoiceBox neural network implementation

Language:Jupyter NotebookStargazers:71Issues:11Issues:11

OpenPhonemizer

Permissively licensed, open sourced, local IPA Phonemizer (G2P) powered by deep learning.

Language:PythonLicense:BSD-3-Clause-ClearStargazers:70Issues:4Issues:5

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonLicense:Apache-2.0Stargazers:65Issues:11Issues:0

DTTNet-Pytorch

An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation

Language:PythonLicense:Apache-2.0Stargazers:61Issues:4Issues:2

pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

X-E-Speech-code

X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion

Language:PythonLicense:MITStargazers:60Issues:8Issues:4

hilcodec

High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec

Language:Jupyter NotebookLicense:MITStargazers:55Issues:0Issues:0

g2p-mix

Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English

Language:PythonLicense:MITStargazers:51Issues:0Issues:0

LangSegment

It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。

FlashSpeech

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Stargazers:38Issues:0Issues:0

Train_Hifigan_XTTS

This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.

Language:PythonStargazers:28Issues:0Issues:0

speechtoolkit

[EARLY PUBLIC ALPHA] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activity detection, and more!

Language:PythonStargazers:19Issues:4Issues:0

Lightvoc

LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM

Language:Jupyter NotebookStargazers:16Issues:0Issues:0

xcodec

X-Codec: Unified Audio Tokenizer for Audio Language Model

Stargazers:14Issues:0Issues:0