There are 39 repositories under automatic-speech-recognition topic.
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
OpenAI Whisper ASR Webservice API
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
End-to-end ASR/LM implementation with PyTorch
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
The dataset of Speech Recognition
🔉 Youtube Videos Transcription with OpenAI's Whisper
[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025] A Series of Multilingual Multitask Medical Speech Processing
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names.
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Thonburian Whisper: Open models for fine-tuned Whisper in Thai. Try our demo on Huggingface space:
AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models
VietASR - Vietnamese Automatic Speech Recognition