wzy's repositories
AEGAN-AD
Official pytorch implementation of AEGAN-AD
bert4torch
pytorch implement of transformers refer to bert4keras
Bert-VITS2-Integration-train-txt-infer
适配windows的requirements.txt,加了个长文本分段推理和手机听书的api,非本专业,屎山代码
emotional-vits
无需情感标注的情感可控语音合成模型,基于VITS
FastASR
这是一个用C++实现ASR推理的项目,它依赖很少,安装也很简单,推理速度很快,在树莓派4B等ARM平台也可以流畅的运行。 推理模型是基于目前最先进的conformer模型,使用10000+小时的wenetspeech数据集训练得到, 所以识别效果也很好,可以媲美许多商用的ASR软件。
FreeVC
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
FunASR
A Fundamental End-to-End Speech Recognition Toolkit
Genshin_Datasets
Genshin Datasets For SVC/SVS/TTS
GenshinVoice
Voice dataset of Genshin Impact 原神语音数据集
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MoeGoe
Executable file for VITS inference
PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
porcupine
On-device wake word detection powered by deep learning
pykaldi
A Python wrapper for Kaldi
realtime-vad-sample
Sample code of real-time voice activity detection using webrtcvad.
RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription. Designed for real-time applications like voice assistants.
Retrieval-based-Voice-Conversion-WebUI
Voice data <= 10 mins can also be used to train a good VC model!
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
travel-chatbot
This project implements a travel chatbot powered by the RAG (Retrieve and Generate) chain, providing real-time information retrieval using various tools and the ability to fetch weather reports.
vits
VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
vits_chinese
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support streaming out!
Whisper-Finetune
微调Whisper模型和加速推理
whisper-finetuning
[WIP] Scripts for fine-tuning Whisper