voice-activity-detection

There are 48 repositories under voice-activity-detection topic.

NoiseTorch
noisetorch / NoiseTorch
Real-time microphone noise suppression on Linux.
hacktoberfest hacktoberfest2023 linux noise-reduction noise-suppression pulseaudio voice voice-activated voice-activity-detection
Language:Go 9000
smacke / ffsubsync
Automagically synchronize subtitles with video.
alignment audio caption captions fast-fourier-transform ffmpeg fft speech-detection srt srt-subtitles string-alignment subtitle subtitles sync synchronization vad video vlc vlc-media-player voice-activity-detection
Language:Python 6536
pyannote / pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
pytorch speech-processing speaker-diarization speech-activity-detection speaker-change-detection speaker-embedding voice-activity-detection pretrained-models overlapped-speech-detection speaker-recognition speaker-verification
Language:Jupyter Notebook 5158
alibaba-damo-academy / FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. ｜语音识别工具包，包含丰富的性能优越的开源预训练模型，支持语音识别、语音端点检测、文本后处理等，具备服务部署能力。
audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper
Language:Python 3714
snakers4 / silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
onnx pytorch voice-activity-detection voice-commands voice-control voice-detection voice-recognition
Language:Python 2929
autosub
BingLingGroup / autosub
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
audio-segmentation baidu-api cloud-speech-api substation-alpha subtitles voice-activity-detection xfyun xunfei-api
Language:Python 1967
jim-schwoebel / voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
voice-dataset voice-datasets audio-dataset audio-datasets datasets dataset voice data voice-computing voice-control voice-synthesis voice-commands voice-assistant voice-recognition voice-chat voice-activity-detection voice-conversion noise
1559
coqui-ai / open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
tts stt speech-to-text text-to-speech speech-recognition speech-synthesis speech-processing voice-recognition voice-activity-detection voice-cloning speech-emotion-recognition speech-separation
1213
Python-ai-assistant
ggeop / Python-ai-assistant
Python AI assistant 🧠
python35 python voice-recognition voice-assistant voice-control voice-activity-detection voice-chat nlp-machine-learning voice-commands linux-assistant nlp voice-recognition-experiment ai sklearn wolfram-language nltk google-speech-recognition google-speech-to-text mongodb pymongo
Language:Python 861
jtkim-kaist / VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
acam attention bdnn data dnn lstm speech speech-activity-detection speech-recognition vad voice-activity-detection voice-detection
Language:MATLAB 821
diart
juanmc2005 / diart
A python package to build AI-powered real-time audio applications
speaker-diarization streaming-audio real-time speaker-embedding deep-learning transcription voice-activity-detection
Language:Python 815
amsehili / auditok
An audio/acoustic activity detection and audio segmentation tool
audio-activities audio-data audio-segmentation voice-detection vad voice-activity-detection
Language:Python 716
ina-foss / inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
speech-segmentation audio-analysis music-detection speech-music gender-equality gender-classification speaker-gender speech music voice-activity-detection speech-detection mirex noise female male gender segmentation praat speech-activity-detection
Language:Python 697
baxtree / subaligner
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
subtitles captions alignment subrip ttml voice-activity-detection subtitle-synchronization webvtt substation-alpha microdvd mpl2 tmp sami ebu-stl advanced-substation-alpha subtitle-translation subtitle-conversion scc sbv transcription
Language:Python 419
jim-schwoebel / voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
voice voice-assistant voice-recognition voice-recording transcription featurization data data-cleaning visualization generation voice-activity-detection voice-control server security encryption-decryption python3 machine-learning wake-word-detection voice-computing
Language:Python 367
filippogiruzzi / voice_activity_detection
Voice Activity Detection based on Deep Learning & TensorFlow
voice-activity-detection deep-learning speech tensorflow time-series time-series-classification resnet speech-recognition speech-detection python mfcc-features machine-learning vad deeplearning artificial-intelligence deep-neural-networks librispeech librispeech-dataset
Language:Python 339
tomchang25 / whisper-auto-transcribe
Auto transcribe tool based on whisper
asr text-to-speech deep-learning speech-recognition speech-to-text language-model pytorch speech-processing voice-activity-detection gradio gradio-interface video-captioning
Language:Python 197
gkonovalov / android-vad
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
vad offline real-time audio-processing gmm webrtc android dnn on-device-ai silero-vad neural-networks speech-detection voice-detection silero deep-neural-networks onnx-models speech-recoginition voice-activity-detector voice-activity-detection yamnet
Language:C 190
shashikg / WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
asr deep-learning speech-recognition speech-to-text whisper tensorrt-llm tensorrt vad voice-activity-detection
Language:Jupyter Notebook 190
nicklashansen / voice-activity-detection
Voice Activity Detection (VAD) using deep learning.
convolutional-neural-networks deep-learning deep-neural-networks densenet focal-loss pytorch recurrent-neural-networks voice-activity-detection
Language:Jupyter Notebook 181
eesungkim / Voice_Activity_Detector
A statistical model-based Voice Activity Detection
vad voice-activity-detection voice-detection
Language:Jupyter Notebook 178
Picovoice / cobra
On-device voice activity detection (VAD) powered by deep learning
voice-activity-detection speech-recognition vad on-device voice-activity voice-activity-detector
Language:Python 140
RicherMans / GPV
Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper
machine-learning noise-robust-asr pytorch sound-activity speech-activity-detection voice-activity-detection
Language:Python 140
voithru / voice-activity-detection
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021
voice-activity-detection vad
Language:Python 137
zhenghuatan / rVAD
Matlab and Python libraries for an unsupervised method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
voice-activity-detection noise-robust uunsupervised-learning
Language:MATLAB 123
zhenghuatan / rVADfast
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
voice-activity-detection
Language:Python 119
RicherMans / Datadriven-GPVAD
The codebase for Data-driven general-purpose voice activity detection.
noise-robust voice-activity-detection speech-activity-detection machine-learning pytorch
Language:Python 90
Ankit-Kumar-Saini / Coursera_Deep_Learning_Specialization
Implementation of Logistic Regression, MLP, CNN, RNN & LSTM from scratch in python. Training of deep learning models for image classification, object detection, and sequence processing (including transformers implementation) in TensorFlow.
neural-networks cnn-for-visual-recognition rnn-lstm transfer-learning hyperparameter-tuning structuring-ml-projects audio-processing voice-activity-detection coursera mlp andrew-ng deep-learning face-recognition optimization-algorithms transformers image-segmentation-tensorflow question-answering named-entity-recognition
Language:Jupyter Notebook 89
NickWilkinson37 / voxseg
A python library for voice activity detection (VAD) for speech/non-speech segmentation.
speech-processing voice-activity-detection speech-segmentation speech vad python-library python
Language:Python 76
jim-schwoebel / voice_gender_detection
♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).
machine-learning machine-learning-tutorial machine-learning-model machine-learning-modeling voice-commands voice gender-detection gender-classification voice-computing voice-assistant voice-recognition voice-control voice-activity-detection machine-learning-practice tutorial workshop-materials surveylex neurolex
Language:Python 72
spokestack-android
spokestack / spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
android asr natural-language-understanding nlu speech speech-api speech-recognition speech-synthesis text-to-speech tts vad voice voice-activity-detection voice-as-an-interface voice-assistant voice-recognition voice-synthesis wakeword wakeword-activation
Language:Java 66
gooofy / py-nltools
A collection of basic python modules for spoken natural language processing
natural-language-processing pulseaudio phonetics tts speech-recognition tokenizer voice-activity-detection
Language:Python 56
react-native-spokestack
spokestack / react-native-spokestack
Spokestack: give your React Native app a voice interface!
voice-interface react-native android ios voice-recognition voice-assistant voice-commands voice-control voice-activity-detection speech-recognition speech-to-text speech-synthesis speech-processing speech-api tts text-to-speech asr nlu nlu-engine hacktoberfest
Language:TypeScript 55
itsp
Speech-Interaction-Technology-Aalto-U / itsp
Introduction to Speech Processing
speaker-recognition speech-analysis speech-enhancement speech-modelling speech-processing voice-activity-detection speech-coding speech-quality-evaluation
Language:Jupyter Notebook 45
SEPIA-Framework / sepia-web-audio
Create modular, cross-browser, web audio pipelines to record and process audio in background threads. Comes with modules for VAD, ASR, resampling and much more...
webaudio audio-processing speech-recognition voice-activity-detection recorder background-worker wake-word-detection
Language:JavaScript 42
spokestack-ios
spokestack / spokestack-ios
Spokestack: give your iOS app a voice interface!
asr hacktoberfest ios natural-language-understanding speech-api speech-processing speech-recognition speech-synthesis speech-to-text swift tensorflow text-to-speech vad voice-activity-detection voice-assistant voice-recognition voice-synthesis wakeword wakeword-activation
Language:Swift 41

voice-activity-detection

noisetorch / NoiseTorch

smacke / ffsubsync

pyannote / pyannote-audio

alibaba-damo-academy / FunASR

snakers4 / silero-vad

BingLingGroup / autosub

jim-schwoebel / voice_datasets

coqui-ai / open-speech-corpora

ggeop / Python-ai-assistant

jtkim-kaist / VAD

juanmc2005 / diart

amsehili / auditok

ina-foss / inaSpeechSegmenter

baxtree / subaligner

jim-schwoebel / voicebook

filippogiruzzi / voice_activity_detection

tomchang25 / whisper-auto-transcribe

gkonovalov / android-vad

shashikg / WhisperS2T

nicklashansen / voice-activity-detection

eesungkim / Voice_Activity_Detector

Picovoice / cobra

RicherMans / GPV

voithru / voice-activity-detection

zhenghuatan / rVAD

zhenghuatan / rVADfast

RicherMans / Datadriven-GPVAD

Ankit-Kumar-Saini / Coursera_Deep_Learning_Specialization

NickWilkinson37 / voxseg

jim-schwoebel / voice_gender_detection

spokestack / spokestack-android

gooofy / py-nltools

spokestack / react-native-spokestack

Speech-Interaction-Technology-Aalto-U / itsp

SEPIA-Framework / sepia-web-audio

spokestack / spokestack-ios