There are 278 repositories under speech-to-text topic.
Port of OpenAI's Whisper model in C/C++
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Faster Whisper transcription with CTranslate2
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
A PyTorch-based Speech Toolkit
Speech recognition module for Python, supporting several engines and APIs, online and offline.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Multilingual Voice Understanding Model
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
The python library for real-time communication
OpenAI Whisper ASR Webservice API
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
Lingvo
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Free, easy, portable audio engine for games
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Kalliope is a framework that will help you to create your own personal assistant.