There are 252 repositories under speech-recognition topic.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Port of OpenAI's Whisper model in C/C++
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Faster Whisper transcription with CTranslate2
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
A PyTorch-based Speech Toolkit
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Speech recognition module for Python, supporting several engines and APIs, online and offline.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Facebook AI Research's Automatic Speech Recognition Toolkit
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
All-in-One Development Tool based on PaddlePaddle
Multilingual Voice Understanding Model
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
On-device Speech Recognition for Apple Silicon
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Machine Learning Resources, Practice and Research
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Lingvo
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow