Beast code in Giters

cythc's starred repositories

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonAGPL-3.025580 177 130

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.011764 204 2258

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonAGPL-3.07885 500

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Language:PythonMIT7592 82 152

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.07322 63 150

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonMIT4827 78 192

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonMIT3457 57 70

MoeGoe

Executable file for VITS inference

Language:PythonMIT2338 16 41

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonMIT1362 50 21

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonMIT1270 53 31

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonMIT1173 56 52

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT1167 24 86

chinese_speech_pretrain

chinese speech pretrained models

Language:Shell1017 10 56

vits

VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai

Language:PythonMIT909 70

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

MIT553 86 4

string2string

String-to-String Algorithms for Natural Language Processing

Language:Jupyter NotebookMIT533 10 4

UniAudio

The Open Source Code of UniAudio

Language:Python510 37 33

knn-vc

Voice Conversion With Just Nearest Neighbors

Language:PythonNOASSERTION450 16 38

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonMIT445 16 65

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonApache-2.0368 13 54

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonMIT297 10 22

PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Language:PythonMIT214 14 48

SyntaSpeech

SyntaSpeech: Syntax-aware Generative Adversarial Text-to-Speech; IJCAI 2022; Official code

Language:PythonMIT193 10 11

whisper-vits-japanese

Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)

Language:Jupyter NotebookMIT161 6 15

HiFTNet

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Language:PythonMIT129 10 10

AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

Language:PythonMIT111 8 11

TransferTTS

TransferTTS (Zero-Shot learning of VITS)

Language:PythonMIT86 5 2

SpeechTasks

This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.

72 30

silk-codec

Silk coder; Encode audio to silk; Decode silk to PCM

Language:C++Apache-2.047 3 8

SC-VITS

VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.

Language:PythonMIT33 20