snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Open In Colab

header


Silero VAD


Silero VAD - pre-trained enterprise-grade Voice Activity Detector (also see our STT models).


Real Time Example
real-time-example.mp4

Key Features


  • Stellar accuracy

    Silero VAD has excellent results on speech detection tasks.

  • Fast

    One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 4-5x faster.

  • Lightweight

    JIT model is around one megabyte in size.

  • General

    Silero VAD was trained on huge corpora that include over 100 languages and it performs well on audios from different domains with various background noise and quality levels.

  • Flexible sampling rate

    Silero VAD supports 8000 Hz and 16000 Hz sampling rates.

  • Flexible chunk size

    Model was trained on 30 ms. Longer chunks are supported directly, others may work as well.

  • Highly Portable

    Silero VAD reaps benefits from the rich ecosystems built around PyTorch and ONNX running everywhere where these runtimes are available.

  • No Strings Attached

    Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.


Typical Use Cases


  • Voice activity detection for IOT / edge / mobile use cases
  • Data cleaning and preparation, voice detection in general
  • Telephony and call-center automation, voice bots
  • Voice interfaces

Links



Get In Touch


Try our models, create an issue, start a discussion, join our telegram chat, email us, read our news.

Please see our wiki and tiers for relevant information and email us directly.

Citations

@misc{Silero VAD,
  author = {Silero Team},
  title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-vad}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Examples and VAD-based Community Apps


  • Example of VAD ONNX Runtime model usage in C++

  • Voice activity detection for the browser using ONNX Runtime Web

About

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

License:MIT License


Languages

Language:Python 83.2%Language:Jupyter Notebook 16.8%