vshanyiao's starred repositories
Emotional-Speech-Data
This is the GitHub page for publicly available emotional speech data.
B-Llama3-o
B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.
gpt_sovits_infer_with_emotion
基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo
metavoice-src
Foundational model for human-like, expressive TTS
Wav2Vec2FBX
Recognize speech from an audio file and convert it into animation FBX
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
vocode-core
🤖 Build voice-based LLM agents. Modular + open source.
speechbrain
A PyTorch-based Speech Toolkit
WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
vits_chinese
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
emotional-vits
无需情感标注的情感可控语音合成模型,基于VITS
everyone-can-use-english
人人都能用英语
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
dingdang-robot
🤖 叮当是一款可以工作在 Raspberry Pi 上的中文语音对话机器人/智能音箱项目。