There are 7 repositories under vad topic.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
faster_whisper GUI with PySide6
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
Voice Activity Detection based on Deep Learning & TensorFlow
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
A statistical model-based Voice Activity Detection
Python bindings of WebRTC Audio Processing
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021
webrtc中apm相关代码的提取,包括AEC/NS/AGC/VAD ,另外还包括mp3/aac编码器、SoundTouch
An Tensorflow Re-Implement of CVPR 2019 "Object-centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video"
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
A python library for voice activity detection (VAD) for speech/non-speech segmentation.
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Karaoke Player / Editor with automatic clip creation from any song file using vocals and lyrics extraction (Speech-to-Text)
A voice activity detection (VAD) library for Unity.
HadreamAssistant, 你的智能家居/自定义语音助手, 支持树莓派/Linux
Voice Activity Detection LSTM-RNN learning model
A curated list of awesome voice activity detection
Spokestack: give your iOS app a voice interface!
This is FreeSwitch module that can do VAD and ASR with IFLYTEK websocket api.
PyTorch implementation of automatic speech recognition models.
Voice activity detection (VAD) library, based on WebRTC's VAD engine built to WASM with Emscripten to run in browsers, Node, and NativeScript