asr

There are 60 repositories under asr topic.

m-bain / whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
asr speech speech-recognition speech-to-text whisper
Language:Python 17739
NVIDIA / NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
asr deeplearning generative-ai large-language-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts
Language:Python 13607
alphacep / vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
speech-recognition asr voice-recognition speech-to-text android ios raspberry-pi deep-learning deep-neural-networks speech-to-text-android speaker-identification speaker-verification python offline privacy kaldi deepspeech google-speech-to-text vosk stt
Language:Jupyter Notebook 13196
PaddlePaddle / PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr kws speech-recognition sound-classification voice-cloning vocoder voice-recognition self-supervised-learning wav2vec2 whisper code-switch
Language:Python 12228
speechbrain / speechbrain
A PyTorch-based Speech Toolkit
speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition spoken-language-understanding speaker-diarization speaker-verification pytorch huggingface transformers language-model deep-learning
Language:Python 10430
k2-fsa / sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages
asr onnx windows linux macos cpp android ios raspberry-pi aarch64 arm32 csharp dotnet mfc speech-to-text text-to-speech vits risc-v lazarus object-pascal
Language:C++ 7417
wukong-robot
wzpan / wukong-robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。
ai alexa amazon-echo anyq asr bci chatgpt google-home gpt3 homeassistant muse openai raspeberry-pi snowboy speaker tts unit
Language:Python 6770
FunAudioLLM / SenseVoice
Multilingual Voice Understanding Model
ai aigc asr audio-event-classification cross-lingual gpt-4o llm multilingual python pytorch speech-emotion-recognition speech-recognition speech-to-text
Language:Python 6603
jdepoix / youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
youtube-api subtitles youtube transcripts youtube-subtitles youtube-transcripts python transcript subtitle cli captions youtube-captions youtube-transcript youtube-video translating-transcripts asr youtube-asr
Language:Python 6163
TEN-framework / TEN-Agent
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.
agent ai asr cpp gemini golang gpt-4 gpt-4o llm low-latency multimodal nextjs14 openai python rag real-time realtime tts vision voice-assistant
Language:Python 5566
snakers4 / silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark pytorch colab onnx torch-hub text-to-speech tts-models speech speech-synthesis tts repunctuation capitalization
Language:Jupyter Notebook 5481
xiangyuecn / Recorder
html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码
recorder record javascript html5 h5 luyin mp3 wav amr ogg webm webrtc audio recording sound-record dtmf asr html g711a g711u
Language:JavaScript 5420
MahmoudAshraf97 / whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
asr speaker-diarization speech speech-recognition speech-to-text whisper
Language:Jupyter Notebook 4959
wenet-e2e / wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
e2e-models pytorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition whisper
Language:Python 4800
NexaAI / nexa-sdk
On device AI inference in minutes—now for MLX & GGUF and Qualcomm NPU, with Android and iOS coming soon.
edge-computing llm on-device-ai on-device-ml sdk stable-diffusion transformers vlm language-model go
Language:Go 4785
PeterH0323 / Streamer-Sales
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋
asr chat chat-application chatbot chatgpt digital-human gpt internlm-chat-7b internlm2 llm meta-human rag text-generation tts
Language:Python 3150
ahmetoner / whisper-asr-webservice
OpenAI Whisper ASR Webservice API
automatic-speech-recognition speech-recognition speech-to-text openai-whisper docker asr speech
Language:Python 2899
tensorflow / lingvo
Lingvo
speech-recognition translation speech-to-text machine-translation mnist seq2seq language-model tts asr lm nlp tensorflow speech research distributed gpu-computing speech-synthesis
Language:Python 2852
linto-ai / whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
deep-learning speech speech-recognition speech-to-text asr machine-learning python python3 pytorch attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization multilingual-models speaker-diarization speech-processing transformers whisper
Language:Python 2591
coqui-ai / STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
stt speech-to-text tensorflow deep-learning automatic-speech-recognition asr voice-recognition speech-recognition speech-recognizer speech-recognition-api
Language:C++ 2408
pytorch-kaldi
mravanelli / pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
speech-recognition gru dnn kaldi rnn-model pytorch timit deep-learning deep-neural-networks recurrent-neural-networks multilayer-perceptron-network lstm lstm-neural-networks speech asr rnn dnn-hmm
Language:Python 2391
CheshireCC / faster-whisper-GUI
faster_whisper GUI with PySide6
asr faster-whisper openai transcribe vad voice-transcription whisper whisperx
Language:Python 2280
Purfview / whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
openai speech-to-text transcriber whisper asr speech-recognition subtitles ctranslate2 faster-whisper whisper-faster whisperx uvr diarization vocal-extractor speaker-diarization
1908
Delta-ML / delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
nlp deep-learning tensorflow speech sequence-to-sequence seq2seq speech-recognition text-classification speaker-verification nlu text-generation emotion-recognition tensorflow-serving tensorflow-lite inference asr serving front-end custom-ops ops
Language:Python 1597
harry0703 / AudioNotes
快速提取音视频内容，整理成一份结构化的markdown笔记
ai asr funasr ollama python qwen2 whisper
Language:Python 1587
wwbin2017 / bailing
百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断
ai asr chatgpt chattts deepseek funasr gpt-4o llm openai tts voice-assistant
Language:Python 1436
k2-fsa / sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
asr c cpp csharp go kotlin python speech-recognition vad voice-activity-detection
Language:C++ 1262
mravanelli / SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
deep-learning audio waveform filtering cnn convolutional-neural-networks speaker-recognition speaker-verification speaker-identification speech-recognition asr audio-processing speech-processing digital-signal-processing signal-processing neural-networks artificial-intelligence timit pytorch python
Language:Python 1195
Speech-AI-Forge
lenML / Speech-AI-Forge
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
chattts ssml tts chattts-forge agent gpt llm text-to-speech colab llama chinese english fish-speech cosyvoice cosy-voice asr stt firered whisper fireredtts
Language:Python 1164
ictnlp / StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
seamless simultaneous-translation speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing
Language:Python 1147
yeyupiaoling / Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
asr ctranslate2 huggingface whisper lora speech-recognition transformers chinese pytorch android web
Language:C 1122
R3gm / SoniTranslate
Synchronized Translation for Videos. Video dubbing
audio-processing diarization translation translate-audio translate-video video-dubbing asr automatic-dubbing document-translator dubbing speech-to-text stt subtitle-to-speech text-to-speech tts
Language:Python 1096
sooftware / conformer
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
asr augmented cnn conformer conv convolution pytorch recognition speech speech-recognition transformer transformer-xl
Language:Python 1071
pykaldi / pykaldi
A Python wrapper for Kaldi
python wrapper kaldi openfst asr speech-recognition speech language-model feature-extraction clif numpy
Language:Python 1027
alphacep / vosk-server
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
websocket speech-recognition kaldi python asr grpc saas webrtc vosk
Language:Python 1025
athena-team / athena
an open-source implementation of sequence-to-sequence based speech processing engine
speech-recognition asr transformer tensorflow ctc unsupervised-learning sequence-to-sequence deployment wfst speaker-recognition tts speech-synthesis
Language:C++ 959

asr

m-bain / whisperX

NVIDIA / NeMo

alphacep / vosk-api

PaddlePaddle / PaddleSpeech

speechbrain / speechbrain

k2-fsa / sherpa-onnx

wzpan / wukong-robot

FunAudioLLM / SenseVoice

jdepoix / youtube-transcript-api

TEN-framework / TEN-Agent

snakers4 / silero-models

xiangyuecn / Recorder

MahmoudAshraf97 / whisper-diarization

wenet-e2e / wenet

NexaAI / nexa-sdk

PeterH0323 / Streamer-Sales

ahmetoner / whisper-asr-webservice

tensorflow / lingvo

linto-ai / whisper-timestamped

coqui-ai / STT

mravanelli / pytorch-kaldi

CheshireCC / faster-whisper-GUI

Purfview / whisper-standalone-win

Delta-ML / delta

harry0703 / AudioNotes

wwbin2017 / bailing

k2-fsa / sherpa-ncnn

mravanelli / SincNet

lenML / Speech-AI-Forge

ictnlp / StreamSpeech

yeyupiaoling / Whisper-Finetune

R3gm / SoniTranslate

sooftware / conformer

pykaldi / pykaldi

alphacep / vosk-server

athena-team / athena