speech-to-text

There are 278 repositories under speech-to-text topic.

whisper.cpp
ggml-org / whisper.cpp
Port of OpenAI's Whisper model in C/C++
inference openai speech-recognition speech-to-text transformer whisper
Language:C++ 43217
mozilla / DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
deep-learning machine-learning neural-networks tensorflow speech-recognition speech-to-text deepspeech embedded on-device offline
Language:C++ 26594
SYSTRAN / faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference quantization speech-recognition speech-to-text transformer whisper openai
Language:Python 18121
m-bain / whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
asr speech speech-recognition speech-to-text whisper
Language:Python 17739
leon
leon-ai / leon
🧠 Leon is your open-source personal assistant.
ai ai-assistant artificial-intelligence assistant automation bot chatbot flite leon nodejs offline personal-assistant privacy python speech-recognition speech-synthesis speech-to-text text-to-speech virtual-assistant voice-assistant
Language:TypeScript 16648
kaldi-asr / kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
kaldi c-plus-plus cuda shell speech-recognition speech-to-text speaker-verification speaker-id speech
Language:Shell 15112
jianchang512 / pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，同时支持语音识别转录、语音合成、字幕翻译。
speech-to-text text-to-speech video-transition
Language:Python 14210
alphacep / vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
speech-recognition asr voice-recognition speech-to-text android ios raspberry-pi deep-learning deep-neural-networks speech-to-text-android speaker-identification speaker-verification python offline privacy kaldi deepspeech google-speech-to-text vosk stt
Language:Jupyter Notebook 13196
speechbrain / speechbrain
A PyTorch-based Speech Toolkit
speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition spoken-language-understanding speaker-diarization speaker-verification pytorch huggingface transformers language-model deep-learning
Language:Python 10430
Uberi / speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
python audio speech-recognition speech-to-text
Language:Python 8861
nl8590687 / ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
tensorflow cnn ctc python keras speech-recognition speech-to-text chinese-speech-recognition asrt python3
Language:Python 8225
k2-fsa / sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages
asr onnx windows linux macos cpp android ios raspberry-pi aarch64 arm32 csharp dotnet mfc speech-to-text text-to-speech vits risc-v lazarus object-pascal
Language:C++ 7417
TalAter / annyang
💬 Speech recognition for your site
speech-recognition speech speech-to-text voice
Language:JavaScript 6663
KoljaB / RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
python realtime speech-to-text
Language:Python 6630
FunAudioLLM / SenseVoice
Multilingual Voice Understanding Model
ai asr gpt-4o speech-recognition speech-to-text aigc audio-event-classification cross-lingual llm python pytorch speech-emotion-recognition multilingual
Language:Python 6603
snakers4 / silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark pytorch colab onnx torch-hub text-to-speech tts-models speech speech-synthesis tts repunctuation capitalization
Language:Jupyter Notebook 5481
modelscope / FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
speech-recognition video-clip video-subtitles subtitles-generator speech-to-text gradio gradio-python-llm llm
Language:Python 4961
MahmoudAshraf97 / whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
asr speaker-diarization speech speech-recognition speech-to-text whisper
Language:Jupyter Notebook 4959
voice-pro
abus-aikorea / voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
audiobook faster-whisper gradio karaoke podcasts speech-recognition speech-synthesis speech-to-text subtitles text-to-speech transcription translator tts voice-cloning voice-conversion webui whisper whisperx yt-dlp
Language:Python 4808
sanchit-gandhi / whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
deep-learning jax speech-recognition speech-to-text whisper
Language:Jupyter Notebook 4633
huggingface / speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
ai assistant language-model machine-learning python speech speech-synthesis speech-to-text speech-translation
Language:Python 4180
jianchang512 / stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式
speech speech-recognition speech-to-text stt
Language:Python 3823
freddyaboulton / fastrtc
The python library for real-time communication
artificial-intelligence llm python real-time speech-to-text text-to-speech
Language:JavaScript 3467
ahmetoner / whisper-asr-webservice
OpenAI Whisper ASR Webservice API
automatic-speech-recognition speech-recognition speech-to-text openai-whisper docker asr speech
Language:Python 2899
ictnlp / LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
large-language-models multimodal-large-language-models speech-interaction speech-language-model speech-to-speech speech-to-text
Language:Python 2886
HeyWillow / willow
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
alexa deep-learning echo esp-adf esp-idf esp32 home-assistant home-automation speech-recognition speech-to-text whisper google-home privacy
Language:C 2866
tensorflow / lingvo
Lingvo
speech-recognition translation speech-to-text machine-translation mnist seq2seq language-model tts asr lm nlp tensorflow speech research distributed gpu-computing speech-synthesis
Language:Python 2852
linto-ai / whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
deep-learning speech speech-recognition speech-to-text asr machine-learning python python3 pytorch attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization multilingual-models speaker-diarization speech-processing transformers whisper
Language:Python 2591
coqui-ai / STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
stt speech-to-text tensorflow deep-learning automatic-speech-recognition asr voice-recognition speech-recognition speech-recognizer speech-recognition-api
Language:C++ 2514
Purfview / whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
openai speech-to-text transcriber whisper asr speech-recognition subtitles ctranslate2 faster-whisper whisper-faster whisperx uvr diarization vocal-extractor speaker-diarization
2471
pannous / tensorflow-speech-recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
deep-learning neural-network speech-recognition speech-to-text stt tensorflow
Language:Python 2173
pluja / whishper
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
ai audio-to-text golang speech-recognition speech-to-text stt subtitles sveltekit transcription ui web web-whisper webapp whisper
Language:Svelte 2159
soloud
jarikomppa / soloud
Free, easy, portable audio engine for games
audio blitzmax c cpp engine flac game game-development gamemaker mp3 ogg opensl-es portable python ruby sound sound-effects speech speech-to-text synthesizer
Language:C 1896
mesolitica / NLP-Models-Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
nlp machine-learning deep-learning lstm attention lstm-seq2seq-tf neural-machine-translation optical-character-recognition dnc-seq2seq pos-tagging summarization embedded luong-api chatbot speech-to-text language-detection
Language:Jupyter Notebook 1786
kalliope-project / kalliope
Kalliope is a framework that will help you to create your own personal assistant.
raspberry bot-creation jarvis personal-assistant linux speech-to-text speech-recognition speech-synthesis bot home-automation
Language:Python 1751
audapolis
bugbakery / audapolis
an editor for spoken-word audio with automatic transcription
audio-editing speech-to-text transcription video-editing
Language:TypeScript 1732