There are 54 repositories under voice-activity-detection topic.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Real-time microphone noise suppression on Linux.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
A python package to build AI-powered real-time audio applications
Python AI assistant 🧠
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Voice Activity Detection based on Deep Learning & TensorFlow
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Auto transcribe tool based on whisper
Voice Activity Detection (VAD) using deep learning.
A statistical model-based Voice Activity Detection
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021
Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
Matlab and Python libraries for an unsupervised method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
Introduction to Speech Processing
Implementation of Logistic Regression, MLP, CNN, RNN & LSTM from scratch in python. Training of deep learning models for image classification, object detection, and sequence processing (including transformers implementation) in TensorFlow.
The codebase for Data-driven general-purpose voice activity detection.
A python library for voice activity detection (VAD) for speech/non-speech segmentation.
♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).