vshanyiao's starred repositories

Emotional-Speech-Data

This is the GitHub page for publicly available emotional speech data.

License:MITStargazers:309Issues:0Issues:0

B-Llama3-o

B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.

Language:PythonStargazers:25Issues:0Issues:0

gpt_sovits_infer_with_emotion

基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo

Language:PythonStargazers:66Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:28208Issues:0Issues:0

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonLicense:Apache-2.0Stargazers:3560Issues:0Issues:0

agents

Build real-time multimodal AI applications 🤖🎙️📹

Language:PythonLicense:Apache-2.0Stargazers:701Issues:0Issues:0

Wav2Vec2FBX

Recognize speech from an audio file and convert it into animation FBX

Language:PythonLicense:Apache-2.0Stargazers:19Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:12591Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:13Issues:0Issues:0

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonLicense:Apache-2.0Stargazers:1168Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:233Issues:0Issues:0

vocode-core

🤖 Build voice-based LLM agents. Modular + open source.

Language:PythonLicense:MITStargazers:2557Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8299Issues:0Issues:0

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonLicense:MITStargazers:8633Issues:0Issues:0
Language:PythonLicense:MPL-2.0Stargazers:259Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:106Issues:0Issues:0

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonLicense:MPL-2.0Stargazers:32222Issues:0Issues:0

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language:Jupyter NotebookLicense:MITStargazers:3617Issues:0Issues:0

kenlm

KenLM: Faster and Smaller Language Model Queries

Language:C++License:NOASSERTIONStargazers:2459Issues:0Issues:0

fastText

Library for fast text representation and classification.

Language:HTMLLicense:MITStargazers:25764Issues:0Issues:0

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:2912Issues:0Issues:0

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:358Issues:0Issues:0

vits_chinese

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Language:PythonLicense:MITStargazers:1127Issues:0Issues:0

emotional-vits

无需情感标注的情感可控语音合成模型,基于VITS

Language:Jupyter NotebookLicense:MITStargazers:1281Issues:0Issues:0

everyone-can-use-english

人人都能用英语

Language:TypeScriptLicense:MPL-2.0Stargazers:22806Issues:0Issues:0

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonLicense:MITStargazers:3476Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:7029Issues:0Issues:0

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonLicense:MITStargazers:500Issues:0Issues:0

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonLicense:Apache-2.0Stargazers:913Issues:0Issues:0

dingdang-robot

🤖 叮当是一款可以工作在 Raspberry Pi 上的中文语音对话机器人/智能音箱项目。

Language:PythonLicense:NOASSERTIONStargazers:1329Issues:0Issues:0