There are 279 repositories under speech-recognition topic.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Port of OpenAI's Whisper model in C/C++
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Faster Whisper transcription with CTranslate2
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
A PyTorch-based Speech Toolkit
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Speech recognition module for Python, supporting several engines and APIs, online and offline.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Multilingual Voice Understanding Model
Facebook AI Research's Automatic Speech Recognition Toolkit
All-in-One Development Tool based on PaddlePaddle
On-device Speech Recognition for Apple Silicon
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Machine Learning Resources, Practice and Research
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
OpenAI Whisper ASR Webservice API
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative