holdurhorses's starred repositories
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
paper-reading
深度学习经典、新论文逐段精读
speechbrain
A PyTorch-based Speech Toolkit
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
chinese_text_normalization
Chinese text normalization for speech processing
CTCWordBeamSearch
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Listen-Attend-Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
voice-activity-detection
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021
torch-mfcc
A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.
Prosody_Prediction
Predict prosody labels for Chinese sentences.
E2E_ASR_Confidence_Estimation
Implementation of the paper "Confidence estimation for attention based sequence to sequence models for speech recognition"
Chinese_PSP
Chinese Prosodic Structure Prediction
is2021_feature_extractor_v2
Instead of posterior probability of recognized tokens, we use GOP scores as the token's confidence scores
DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
Attention-Confidence
Attention mechanism for the estimation of confidence scores
kaldi-hybrid-decoder
In Automatic Speech Recognition(ASR), decoder is either static(based on Weighted Finite State Transducer) or dynamic(based on History Conditioned Word Prefix-Tree/Graph). This project provides a unified approach in Kaldi's framework, extending its decoder for more application scenarios.