cythc's starred repositories

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonLicense:AGPL-3.0Stargazers:25580Issues:177Issues:130

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11764Issues:204Issues:2258

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonLicense:AGPL-3.0Stargazers:7885Issues:50Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Language:PythonLicense:MITStargazers:7592Issues:82Issues:152

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonLicense:Apache-2.0Stargazers:7322Issues:63Issues:150

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonLicense:MITStargazers:4827Issues:78Issues:192

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonLicense:MITStargazers:3457Issues:57Issues:70

MoeGoe

Executable file for VITS inference

Language:PythonLicense:MITStargazers:2338Issues:16Issues:41

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonLicense:MITStargazers:1362Issues:50Issues:21

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonLicense:MITStargazers:1270Issues:53Issues:31

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:MITStargazers:1173Issues:56Issues:52

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1167Issues:24Issues:86

chinese_speech_pretrain

chinese speech pretrained models

vits

VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai

Language:PythonLicense:MITStargazers:909Issues:7Issues:0

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

string2string

String-to-String Algorithms for Natural Language Processing

Language:Jupyter NotebookLicense:MITStargazers:533Issues:10Issues:4

UniAudio

The Open Source Code of UniAudio

knn-vc

Voice Conversion With Just Nearest Neighbors

Language:PythonLicense:NOASSERTIONStargazers:450Issues:16Issues:38

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonLicense:MITStargazers:445Issues:16Issues:65

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:368Issues:13Issues:54

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonLicense:MITStargazers:297Issues:10Issues:22

PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Language:PythonLicense:MITStargazers:214Issues:14Issues:48

SyntaSpeech

SyntaSpeech: Syntax-aware Generative Adversarial Text-to-Speech; IJCAI 2022; Official code

Language:PythonLicense:MITStargazers:193Issues:10Issues:11

whisper-vits-japanese

Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)

Language:Jupyter NotebookLicense:MITStargazers:161Issues:6Issues:15

HiFTNet

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Language:PythonLicense:MITStargazers:129Issues:10Issues:10

AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

Language:PythonLicense:MITStargazers:111Issues:8Issues:11

TransferTTS

TransferTTS (Zero-Shot learning of VITS)

Language:PythonLicense:MITStargazers:86Issues:5Issues:2

SpeechTasks

This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.

silk-codec

Silk coder; Encode audio to silk; Decode silk to PCM

Language:C++License:Apache-2.0Stargazers:47Issues:3Issues:8

SC-VITS

VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.

Language:PythonLicense:MITStargazers:33Issues:2Issues:0