Beast code in Giters

vshanyiao's starred repositories

Emotional-Speech-Data

This is the GitHub page for publicly available emotional speech data.

MIT30900

B-Llama3-o

B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.

Language:Python2500

gpt_sovits_infer_with_emotion

基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo

Language:Python6600

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02820800

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonApache-2.0356000

agents

Build real-time multimodal AI applications 🤖🎙️📹

Language:PythonApache-2.070100

Wav2Vec2FBX

Recognize speech from an audio file and convert it into animation FBX

Language:PythonApache-2.01900

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonApache-2.01259100

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT1300

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonApache-2.0116800

RGB

Language:PythonNOASSERTION23300

vocode-core

🤖 Build voice-based LLM agents. Modular + open source.

Language:PythonMIT255700

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonApache-2.0829900

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonMIT863300

xtts-streaming-server

Language:PythonMPL-2.025900

AceGPT

Language:PythonApache-2.010600

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonMPL-2.03222200

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language:Jupyter NotebookMIT361700

kenlm

KenLM: Faster and Smaller Language Model Queries

Language:C++NOASSERTION245900

fastText

Library for fast text representation and classification.

Language:HTMLMIT2576400

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonApache-2.0291200

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonApache-2.035800

vits_chinese

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Language:PythonMIT112700

emotional-vits

无需情感标注的情感可控语音合成模型，基于VITS

Language:Jupyter NotebookMIT128100

everyone-can-use-english

人人都能用英语

Language:TypeScriptMPL-2.02280600

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonMIT347600

LWM

Language:PythonApache-2.0702900

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonMIT50000

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonApache-2.091300

dingdang-robot

🤖 叮当是一款可以工作在 Raspberry Pi 上的中文语音对话机器人/智能音箱项目。

Language:PythonNOASSERTION132900